GMP unboxed number representation
Hans Aberg
haberg@matematik.su.se
Fri, 4 May 2001 23:08:59 +0200
At 19:06 +0200 2001/05/04, Marcin 'Qrczak' Kowalczyk wrote:
>> Let's focus on integers.
>
>ghc doesn't use other gmp types.
GMP has a very interesting multi-precision floating number type as well,
but it has the same problem as the integers: It does not use the native
double's for small float, so it probably becomes slow.
>> One problems though is the GC (Garbage Collector):
>
>Indeed. ghc solves it by using gmp format only temporarily to perform
>operations, and keeping numbers in its own structures.
Does that mean that you are getting extra memory allocations, or are you
simply handling over a suitable mpz, and then picking up the _mp_d pointer
in you own format?
The latter sounds interesting, perhaps one can use it in a C++ wrap.
>> Then, in order to avoid a dynamic allocation for the ref count
>> as well, one wants the object pointed to by _mp_d above contain
>> all data,
>
>Attaching data to _mp_d can be done without changing gmp, but also
>without the ability to mix other library using gmp in the same program:
>by providing appropriate allocation functions, which allocate more,
>fill the header and return a shifted pointer to gmp. This is what
>ghc does.
Yes, this is the approach I suggested, but then as a C library the one with
the different allocation function will become binary incompatible with the
regular library.
Perhaps I though in too simplistic patterns; if I put the pointer shifting
in the C++ library, I might automate it. But then the C++ library becomes
incompatible with the C library. Perhaps it does not matter.
>I think you would have to be careful to not use gmp functions which
>change numbers in place. In Haskell there is no problem because the
>interface is functional anyway, but a C++ user might want to have
>++z performed in place. When the number can be shared, it doesn't
>work. So either do it functionally or copy by value. In either case
>sometimes you lose. Leaving gmp memory management to the programmer
>would make using it awful.
I do not think that standard C++ library containers give such guarantees
(say for std::string); one merely builds an interface. (As for the ref
count, I use a function detach() which removes any object to be mutated
from the reference cluster before mutation.) So it should not be a problem.
But it could be interesting for optimization to have GMP functions that are
performed in place. One can use some interesting mixtures, say only put the
information about the pointer, but copying the sign, in order to make sign
changes fast.
>> So here comes the question in, if now GMP should serve say the
>> implementation of compilers such as GHC, what kind of low number
>> representations should one use?
>
>I don't know how to do it better than currently: that it lets the
>program/library which uses gmp manage gmp's memory.
>
>> So this suggest that it might be a disadvantage for GHC to have a
>> GMP mergeing the two number representations.
>
>It would be an unnecessary complication, but manageable. Since ghc
>creates mpz objects for each operation, it could use only the big
>representation of gmp on inputs, handling small numbers itself as
>currently. But it must be prepared to receive small result. It would
>be silly to wrap it in an allocated memory if gmp produced a small
>answer, so primops (implemented in C or assembler) should return
>either representation.
Perhaps GMP should provide small number arithmetic, with a fast way to
determine if the answer fits in a small number representation. For example
mpz_si_si, taking two (long) int's, but with the result in a pointer. If
one hands over at least two limbs long allocation, this function does not
need to make an allocation, but the idea is that one should be able to look
at the result to quickly determine if it fits into one int then.
>It would be a bit ugly. Ghc's primops use a restriced set of types
>and rarely allocate memory themselves. For example plusInteger#
>has the type
> Int# -> ByteArr# -> Int# -> ByteArr# -> (# Int#, ByteArr# #)
>where (# x, y #) means to return both values on the stack.
>It is wrapped for integer addition thus:
>
>plusInteger :: Integer -> Integer -> Integer
>plusInteger i1@(S# i) i2@(S# j) = case addIntC# i j of { (# r, c #) ->
> if c ==# 0# then S# r
> else toBig i1 + toBig i2 }
>plusInteger i1@(J# _ _) i2@(S# _) = i1 + toBig i2
>plusInteger i1@(S# _) i2@(J# _ _) = toBig i1 + i2
>plusInteger (J# s1 d1) (J# s2 d2) = case plusInteger# s1 d1 s2 d2 of
> (# s, d #) -> J# s d
>
>Returning one of two variants from a primop requires expressing it
>similarly to a C struct, without type punning. So it could be this:
> plusInteger# :: Int# -> ByteArr#
> -> Int# -> ByteArr#
> -> (# Int#, Int#, ByteArr# #)
>plusInteger (J# s1 d1) (J# s2 d2) = case plusInteger# s1 d1 s2 d2 of
> (# 0#, i, _ #) -> S# i
> (# _, s, d #) -> J# s d
>
>The third component in the S# case is ignored, but the primop must
>pass something - a pointer to a dummy Haskell object (this technique
>is used in some other primops).
All this checking probably slows down the small format as well.
>This assumes that the condition for having an alternative
>representation is the same in ghc and gmp: fitting into a single
>signed integer of the limb size (i.e. pointer size).
So you are not using GMP as an interface. One should really try to get a
better GMP interface so that program like GHC can use that instead.
>Note that the J# representation currently relies on the fact that
>mpz can be reconstructed from an integer + an array of words of an
>explicit size. It's not possible to change the representation of
>mpz and keep the J# representation in ghc.
>
>> Or is it so that the dispatch code can only be generated if the
>> input data is sufficiently static, in which it would be great
>> advantage with it in GMP in the case the data is dynamic?
>
>It's dynamic too. An advantage of making the variants visible to
>the compiler is that a function returning an Integer is compiled to
>a code which takes an address of two continuations as an argument,
>and enters the continuation corresponding to the constructor returned,
>passing it constructor arguments as arguments. It doesn't necessarily
>allocate S# or J# node on the heap if it's to be evaluated right
>away. This happens to all algebraic types in general (perhaps there
>are size constraints, I don't know). So I guess that dispatching is
>best done by using an algebraic type.
>
>But I'm not sure. The ghc documentation says that ghc loves
>single-constructor types. It was written a long time ago, but perhaps
>it's still true. So how to distinguish representations in another
>way? Well, perhaps
> data Integer = J# Int# Int# ByteArray#
>with a dummy object for the ByteArray# in the short case would work.
>I don't know how well: it's ugly but maybe fast, and it seems easier
>to integrate with a variant representation in gmp.
>
>In any case better judging of performance of ghc for various
>representations should be done by somebody with more knowledge than me.
>I don't know what are exact the overheads in various cases.
>
>> You shouldn't worry that anything going on in GMP would not respect
>> upwards compatibility.
>
>ghc abuses gmp by using its internals directly. When the representation
>of mpz changes, ghc must be changed, even if the C interface remains
>compatible.
>
>Even renaming some gmp functions and providing old names as macros
>(or something like this) in gmp3 caused compatibility problem for ghc,
>and ghc-4.08 requires gmp2, because assembler code calls C functions
>directly and cpp couldn't translate the names.
So this, in the end, suggests that one perhaps should get a better GMP
interface for perhaps both small and large number representations. But it
should then be so that GHC could use that interface, rather than abusing
its internals.
>> The GMP developers have so far judged that the effort does not
>> outweigh the advantage of having a type which is faster for small
>> numbers. Do you think that the speed-up for smaller numbers would
>> be significant for such a rewrite?
>
>It is significant. I don't know any numbers, but ghc's handling
>Integers was told to get a big speedup after introducing the separate
>representation for small integers.
>
>But this might be partially because calling gmp functions has a lot of
>indirections, construction of mpz objects etc., so the speedup could
>result from avoiding calling gmp at all. Allocation in ghc is fast.
What I am was told a long time ago is that as soon as one leaves the simple
builtin types, one must perform a lot of checks: CPU's are simply not built
for handling multiprecision. Those checks steal a lot of cycles, in
addition to memory allocation.
>Perhaps in your C++ wrappers you can distinguish the variant above
>gmp, and use gmp at all only for large numbers? C++ doesn't have
>such indirections, you would surely keep original mpz objects, but
>it might be easier than changing gmp.
There is a C++ GMP wrap library already using small number representations
and a ref count for large numbers, on top of the _mp_d allocation. But the
overflow checks are slow on a C/C++ language level, and better moved into
assembler where available.
>Note that I'm not against changing gmp. It's not obvious how the
>consequences would be...
One must have a good idea of how to get about it, before starting to write
on something like that. It is an interesting question though.
Hans Aberg