Replacement for GMP

Tue Aug 1 12:18:24 EDT 2006

Hi Peter,

Peter Tanski wrote:
> In the July thread, (Repost) Replacement for GMP as Bignum: ARPREC?
> Haskell?; OS X and OpenSSL, you wrote:
> 
>> In past, I tried to get rid of GMP by replacing it with libtommath
>> http://math.libtomcrypt.com/
>> But I have given up for now - because of related and unrelated problems.
> 
> Since I had no prior experience with LibTomMath I decided to take a look
> at it.  The most recent release version of LibTomMath seems to be 0.39. 
> Were some of the "related problems" you ran into due to bugs in an
> earlier version of LibTomMath?  Maybe it is premature for me to go
> looking at replacing Integer when GHC has already moved so far ahead of
> itself I have to service the source code just to get it to build on OS
> X, but I figure that while I have the hood up I might as well take a
> look at the rest of the engine...  If you have any old notes on the
> problems you encountered, I would greatly appreciate it.

What I have written here might not be the most useful guide to
start with, but maybe it is of help for other interested souls.

My first and foremost problem was that I had certain amount time I was
going to use on it and that I use Windows, where ghc build breaks pretty
often - and it was very badly broken at the time.  I simply ran out of
time and motivation.  The resistance I met in community (irc mainly)
for this step was also annoying.  I also went wrong way when I started
and added configure-option, but Simon Marlow (iirc) asked me not to add
configure option, but to obliterate gmp alltogether.

The big steps, to get things done, in this matter are:
 * Understanding how memory handling between math lib and GC works
 * Understanding some C--
 * Understanding autotools and makefile hackery

If you are new to ghc source, you should read
(some of)
http://hackage.haskell.org/trac/ghc/
and
http://darcs.haskell.org/ghc/HACKING

Most of the code for this task is fairly easy, actually.  The bad thing
is that it cannot be easily split into many small steps.
Most of the task has very small scope: Most of the backend doesn't care,
we just use exising bits, frontend and base-package don't see much
or any diffrence.

I'll add here some notes that might be helpful (but as I type these out
of memory, they might not be 100% accurate):

 * The memory handling:
    The idea on most bignum libs is that they have c-structure kinda
    like this:
    struct Big {
       word size, used;
       digits* payload;
       bool sign;
    } ;
   Now, the size and used tell how much memory is allocated for paylod
   and how much of it is used.  sign is the sign (plus/minus).
   payload is a pointer to memory that contains Integer decoded.

   Normally, in c, you would reuse these variables, but with
   immutability, we can't do that.  This is handled in GHC so that
   payload (and sign) are stored in a bytearray, which is native
   garbage collectable type.  It natively contains size.  Before we
   call math-lib, we put together a temporary structure with correct
   pointers.  As for target variable, we have hooked the mathlibs
   memory allocation functions to allocate correctly.  Upon returning
   Integer, we just take payload, write sign on correct place and
   return the payload-pointer (possibly adjusted).

   In pseudo C
   digits* add(digits* din) {
     Big in, out;
     in.size=getLength(din);
     in.used=getLength(din);
     in.payload=din;
     in.sign=getSign(din);
     math_lib_init(out);
     math_lib_add(out, in);
     writeSign(out.payload, out.sign);
     return out.payload;
   }

   There are tricky parts for 64bit-stuff in 32bit systems and some
   floating point decoding uses bad configure-stuff that depends on
   math lib stuff, but mostly it's very boring hundreds of lines of
   C-- (but you can make the job much easier by using preprocessor).

   This is why APPREC might be hard - you need to know the internal
   representation.

 * C-- and related RTS stuff
   C-- is pretty easy, you should read C-- spec at cminusminus.org.
   GHC C-- unfortunately is not really near the C-- spec, it doesn't
   first of all implement it all - but that doesn't matter for this
   task - and then it has some extensions for casting and structure
   reading, I think.  When removing GMP, it is easy to spot places
   that needs corresponding declrations/definitions for new math-lib
   instead of GMP.  Reading old GMP primops, where most of the
   interesting c-- code is at, should also make it clearer - after
   a while, atleast.

 * Autotools and makefiles
   What a mess.  Well, it always is for big projects.  The annoying
   thing here for LibTomMath was that I found out, I'd need to add
   autotool configure-hackery into ghc and make internal libtommath
   dependant on it, so that digit-size and such were the size of native
   "word" on each machine.  There's also a lot of DLL-related stuff
   in makefiles, which, I think, can safely be removed - some of it
   is wrong and outdated, so when/if it gets waken up, it needs to be
   rewritten anyway.  GMP has it's own configure-script and it's
   invoked from the makefiles.

To take on this task, I wouldn't say huge knowledge on ghc internals
is required, but some knowledge on memory handling and low-level
(near asm) programming is useful.  Haskell knowledge for most part
is not required.  Fast machine is useful, but tweaking build options
and having a good book/movies/other machine gets it done.  It'd be
very very useful to have 64-bit platform (in practice 64bit
linux ia32_64) for testing.  I'd say OS X is not a good platform
to do this devel on, but I might be wrong on that.  Linux being
best tested OS, it is the most safe bet.

I'd need to get rid of GMP, and if I can help (short of putting
weeks worth of hours into) just shout.

HTH,
--Esa