bignums, gmp, bytestring, .. ?
Peter Tanski
p.tanski at gmail.com
Sat Nov 18 16:46:24 EST 2006
On Nov 17, 2006, at 7:24 PM, Claus Reinke wrote:
> it seems that haskell versions of bignums is pretty much gone from
> more recent discussions of gmp replacements. now, I assume that
> there are lots of optimizations that keep gmp popular that one
> wouldn't want to have to reproduce, so that a haskell variant might
> not be competitive even if one had an efficient representation, but
>
> - do all those who want to distribute binaries, but not dynamically
> linked, need bignums?
You are right: most don't. Even when working with larger numbers, I
have very rarely used bignum libraries myself, mostly because there
is usually a clever--and often faster--way to deal with large
numbers, especially when you don't require all that extra precision.
These methods were better known and relatively widely used before
multi-precision libraries became so widespread and have even become
more useful since 64-bit machines and C99's 64-bit ints came around.
Integers are mostly a convenience. Large numbers are necessary if
you need very high precision mathematical calculations or if you are
doing cryptography; for that matter, high precision mathematics
usually benefits more from arbitrary precision decimal (fixed or
floating point) for certain calculations.
The simple problem with Haskell and Integer is that, according to the
Standard, Integer is a primitive: it is consequently implemented as
part of the runtime system (RTS), not the Prelude or any library
(though the interface to Integer is in the base library). For GHC,
compiling with -fno-implicit-prelude and explicitly importing only
those functions and types you need the won't get rid of Integer.
Possible solutions would be to implement the Integer 'primitive' as a
separate library and import it into the Prelude or base libraries,
then perform an optimisation step where base functions are only
linked in when needed. Except for the optimisation step, this
actually makes the job easier since Integer functions would be called
using the FFI and held in ForeignPtrs. (I have already done the FFI-
thing for other libraries and a primitive version of the replacement.)
> - it would be nice to know just how far off a good haskell version
> would be performance-wise..
There is actually a relatively recent (2005, revised) Haskell version
of an old Miranda library for "infinite" precision floating point
numbers by Martin Guy, called BigFloat, at http://
bignum.sourceforge.net/. Of course, it is floating point and
Integers would be faster but the general speed difference between the
two would probably be proportional to the speed difference in C and
so would be just as disappointing. The BigFloat library (using the
Haskell version) came in last place at the Many Digits "Friendly"
Competition for 2005 (see http://www.cs.ru.nl/~milad/manydigits/
final_timings_rankings.html), though you would probably be more
interested in looking at the actual timing results to get a better
idea. (The fastest competitors were MPFR, which uses GMP, and The
Wolfram Team, makers of Mathematica; BigFloat actually beat iRRAM and
Maple solutions for several problems.)
The real problem with an Integer library written in *pure* Haskell--
especially with Integers--is simple: Haskell is too high-level and no
current Haskell compiler, even JHC, has even remotely decent support
for low-level optimisations such as being able to unroll a loop over
two arrays of uint32_t and immediately carry the result from adding
the first elements from each array to the addition of the next two,
in two machine instructions. I shouldn't have to mention
parallelization of operations. In short, if you look at general
assembler produced from any Haskell compiler, it is *very* ugly and
Arrays are even uglier. (For a simple comparison to Integer
problems, try implementing a fast bucket sort in Haskell.)
GMP uses hand-written assembler routines for many supported
architectures, partly because GMP was originally created for earlier
versions of GCC which could not optimise as well as current
versions. Even GMP cannot compare to an optimised library using SIMD
(Altivec, SSE)--in my tests, SIMD-optimised algorithms are between 2x
to 10x faster. SIMD and small assembler routines (especially for
architectures without SIMD, especially) are what I have been doing
the bulk of my work on. I doubt I have the ability to extend the
current state of the art with regard to higher-level polynomial
optimisations, so I am always trying out any algorithm I can find.
(For very high precision multiplication (more than 30,000 bits), not
much beats a SIMD-enabled Fast Fourier Transform; a specially coded
Toom-3 algorithm would be faster but for very large operands the
algorithm becomes prohibitively complex. Division is another labour-
intensive area.)
> - what would be a killer for numerical programming, might still be
> quite acceptable for a substantial part of haskell uses?
Quite possibly, yes. I haven't done my own study on what most users
actually require but from notes in the source code of the Python
multi-precision integer library (LongObject), they found most users
of their library never needed more than 2 or 3 uint32_t in length.
(They had originally reserved 5 uint32_t of space for a newly created
PyLongObject.) Of course even ordinary users would notice a change
in speed if they used a much slower library on larger numbers (79
decimal digits, 256 bits, or more) in an interactive session.
> of course, the real gmp replacement project might be going so well
> that a haskell version would be obsolete rather sooner than later, and
> i certainly don't want to interfere with that effort.
Not at all. All ideas gratefully received :) It's going slowly,
partly because I am constantly having to learn more, test and
refactor so the result isn't a naive solution. I could reimplement
GMP, of course, but many of the GMP algorithms are not readily
amenable to SIMD operations--I am trying to *beat* GMP in speed.
> all that being said, it occurred to me that the representations and
> fusions described in the nice "rewriting haskell strings" paper
> would be a good foundation for a haskell bignum project, wouldn't
> they?
They would. Thanks for the link--I had been working from the source
code. A Haskell-C combined solution may sometimes be faster--I
noticed this before now while experimenting with using a random
number generator library written in C, and came to the conclusion
that the speed was largely due to Haskell's ability to cache the
results of many operations, sometimes better than I could predict on
my own. Term rewriting for a hybrid Haskell-C library could be very
useful for complex polynomials. It might use equational
transformations or Template Haskell, GHC's own Ian Lynagh did some
work on this http://web.comlab.ox.ac.uk/oucl/work/ian.lynagh/
Fraskell/ (on Fraskell, with some code examples); and see esp.,
http://web.comlab.ox.ac.uk/oucl/work/ian.lynagh/papers/
Template_Haskell-A_Report_From_The_Field.ps (PostScript file)). The
one problem I have with using equational transformations over arrays
is that it seems very difficult--I would say, impossible without
modifying the compiler--to perform a transformation that works
between two combined arrays when the the array-operands may be of
different sizes.
This is similar to the solution FFTW (see http://www.fftw.org) uses,
only they used OCaml to perform the analysis on particular equations
and patch the higher-level combinations together again in C--FFTW is
essentially an equational compiler. The problem with many of FFTW's
optimisations are that much of their analysis relies on constant
values--you are writing a program to solve one defined mathematical
problem where you know something from the beginning, such as the size
of the operands (that's a biggie!). The SPIRAL project uses similar
methods for more basic operations, such as multiplierless constant
multiplication (transform multiplication of an unknown value with a
constant into an optimal number of left-shifts and additions). See,
e.g., http://www.spiral.net/hardware/multless.html. When writing an
arbitrary precision Integer library for Haskell it must be more
general than that: I have to assume no constants may be available.
It is essentially an optimisation problem--but this whole endeavour
is, in a way.
Cheers,
Pete
More information about the Glasgow-haskell-users
mailing list