Replacement for GMP: Update
Peter Tanski
p.tanski at gmail.com
Wed Jan 24 10:30:02 EST 2007
On Tue, 23 Jan 2007, John Meacham wrote:
> I think the benchmarks are flawed in an important way, I believe, (but
> am not positive) that ARPREC uses a special factorized form of
> representing numbers, which makes multiplicative type operations
> extremely fast, but simple things like addition/subtraction quite
> slow.
Oh, don't worry. I've given up on ARPREC for the simple reason that
*every* operation on small-size operands is very slow to say nothing
of the time it would take to initialise ARPREC every time. ARPREC
does store numbers using a representation similar to floating point,
essentially an array of doubles where:
(int)(array[0]) = array_size
(int)(array[1]) = no_mantissa_words
array[2] = exponent
array[ 3 .. ] = mantissa
I hope the tests aren't "flawed," in that the main purpose of
performing those operations was to see the median-level performance
(extreme performance being Integers with operands greater than,
10,000 or 30,000 bits). The crumby graphs do show that I made a low
cutoff of 256 bits so the most common Integer-use ( < 128 bits, 4
uint32_t) wasn't even tested. I probably didn't clarify why I tested
larger size operands. GHC should have something that is comparable
to GMP, and that means speed (and precision) for medium-large
operands as well as small operands.
> you are only benchmarking multiplicative or similar routines, giving
> ARPREC a huge lead, when in practice it might end up being slowest, as
> addition/subtraction are extremely more common than multiplication.
In the common case it is slowest--but not by much.
> Also, how are you choosing the numbers to test with? it is possible
> that
> some packages are using 'sparse' representations or other specialized
> forms if all your numbers happen to be powers of two or something.
Random numbers--a different number for each iteration in size. I
used a PRNG based on the SNOW2 stream cipher--something I wrote
awhile ago, as fast as arc4random and tests well on DIEHARD and other
statistical things. The GMP and OpenSSL random number generators
were slower and I wanted to use the same generator across libraries.
> also, pretty much all uses of integers will be for very small
> integers,
> we should be careful to not lose sight of speed for the common case
> due
> to pretty asymtotic bounds.
All numbers were the same size, so cases like multiplying a 1024-bit
operand by a 256-bit operand weren't tested; that could be a real
flaw. It's all very sloppy--just to get an idea of where things
generally line up.
So, where am I now? I worked on the GHC-Win off and on and then went
back to making a uniform API between GHC and the replacement, and I
am re-writing the replacement. I thought I would be done several
weeks ago but of course little things take over... One area of GHC I
would really like to change is the memory-use. ForeignPtr seems to
work well but places the full burden of memory management on the
Integer library; for SIMD-use (AltiVec, SSE2), the library would
require GHC to allocate memory aligned to 16-bytes. Right now, the
only choice would be allocatePinned() (in rts/sm/Storage.c), which is
4-byte aligned and it is, of course, pinned so the GC can't move it.
Imagine the RTS-memory problems you could have if you wrote a Haskell
program using lots of small Integers allocated with, say, an
allocatePinned_16() (16-byte aligned). The alternative would be to
use normal memory but re-align the operands for the vectorized
operations by peeling; o.k., doable (especially easy with AltiVec),
but slower. In the meantime I have been working through SSE2
assembler, which doesn't have an addc (add-carry) operation and
doesn't set a flag for overflow, so I have been experimenting with a
SWAR-like algorithm--essentially the same thing as GMP's Nails--to
make the normally 32-bit operands 31 bits with the last bit for a
carry flag. Thorkil Naur and others have suggested writing the whole
thing as small assembler operations and piece them together in
Haskell; I have been looking into that as well but it seems to entail
inlining every Integer function--imagine the code bloat.
Cheers,
Pete
More information about the Glasgow-haskell-users
mailing list