Replacement for GMP: Update

Fri Aug 11 18:56:57 EDT 2006

John,

> After all on the average call where an object of that
> size is free already it is a single array lookup, we have:
>
> (a) fetch pointer (one read)
> (b) fetch next (one read)
> (c) store next as current (one write)

This is true for memory access; it is not true for memory  
allocation.  I do not know how malloc allocates memory on Windows but  
on general POSIX systems the kernel uses a linked list and lots of  
other management things to reduce fragmentation, such KMEMSTAT.   
Malloc may also block, which is something that you have more control  
over in your own garbage collected heap. A really good explanation  
for the problem of rapidly allocating and deallocating temporary  
blocks of memory under 35kb is here: http://ridiculousfish.com/blog/ 
archives/2006/05/16/36/ .

In any case, Simon Marlow had previously mentioned that alloc (from  
GHC's heap) is faster than malloc.  He is almost certainly correct,  
although I hope the difference will not be that great and the only  
thing I have to worry about is ForeignPtr.  We shall see whether  
malloc-memory makes a difference in the benchmarks.

> A purely functional system -- one which does NOT convert
> self tail calls into jumps and reuse storage -- can
> perhaps be faster, since each thread can have its own
> local arena to allocate from (without need of any write
> barrier) .. however it isn't clear to me what the trade
> off is between cache misses and parallelism.

That is interesting but I do not understand whether your mention of  
self tail calls turned into jumps was low or high level.  From the  
context it seems as if you are talking about a high level  
implementation; each function running in a separate thread.  GHC's  
RTS does use many separate threads (the RTS is threaded by default  
for the latest version, 6.6).

As for turning self tail calls into jumps at the low level, GHC does  
do this through C-- (the GHC implementation of C-- is called Cmm).  I  
believe that is both faster and more memory efficient than a high  
level threaded system.  Philosophically speaking, even if Simon  
Peyton-Jones developed Cmm to solve the problem of efficient  
functional computations, Cmm has turned the research of Haskell from  
research on a computer language to research on a system of  
computation.  (Maybe that is what he meant when some time ago he  
wrote John Meacham and said that they (the GHC researchers)  
considered compiling via C a dead end.)  A language can be anything:  
all it requires is a means of being translated into machine code;  
pseudo-intelligent and advanced compiler systems such as gcc, JHC for  
Haskell or OCaml for the Caml version of ML, may translate  
programming languages into machine code but the underlying  
computation remains largely sequential.  The curious thing about GHC- 
Haskell is that through the prism of Cmm, which enforces such things  
as immutable variables and recursion right at the machine level,  
Haskell is less a language of translation to sequential machine code  
and more a description of a computational model.  If you still think  
I am wrong about this, consider the possibility that Haskell with Cmm  
is a modern research project in the same concept that motivated Lisp:  
a different model of computation.

Best regards,
Peter