[Haskell-cafe] Re: Garbage collecting pointers

Simon Marlow marlowsd at gmail.com
Mon Mar 29 12:00:03 EDT 2010


On 26/03/2010 20:28, Mads Lindstrøm wrote:
> Hi
>
> For some time I have been thinking about an idea, which could limit
> Haskell's memory footprint. I don't know if the idea is crazy or clever,
> but I would love to hear peoples thoughts about it. The short story is,
> I propose that the garbage collector should not just reclaim unused
> memory, it should also diminish the need for pointers, by replacing
> nested data structures with larger chunks of consecutive memory. In
> other words, I would diminish the need for pointers for arbitrary
> recursive data types, just as replacing linked lists with arrays
> eliminate the need for pointers.
>
> I will explain my idea by an example of a data type we all know and
> love:
>
> data List a = Cons a (List a) | Nil
>
> each Cons cell uses two pointers - one for the element and one for the
> rest of the list. If we could somehow merge the element and the rest of
> the list into consecutive memory, we would be able to eliminate the
> pointer to the rest of list. On 64 bit architectures merging would save
> us 8 bytes of "wasted" memory. If we could merge n elements together we
> could save n*8 bytes of memory.

The trouble with techniques like this is that they break the uniformity 
of the representation, and complexity leaks all over the place.  Various 
similar ideas have been tried in the past, though not with Haskell as 
far as I'm aware: CDR-coding and BiBOP spring to mind.

> 64 bit pointers are wasteful in another way, as nobody has anywhere near
> 2^64 bytes of memory. And nobody is going to have that much memory for
> decades to come. Thus, we could use the 8 most significant bits for
> something besides pointing to memory, without loosing any significant
> ability to address our memories. This would be similar to how GHC uses
> some of the least significant bits for pointer tagging.

Unfortunatley you don't know whereabouts in your address space your 
memory is located: the OS could do randomised allocation and give you a 
pages all over the place, so in fact you might need all 64 bits.  Yes 
you can start doing OS-specific things, but that always leads to pain 
later (I know, I've been there).

In the Java community there has been experimentation with using 
different representations to avoid the 64-bit pointer overhead: e.g. 
shifting a 32-bit pointer to the left by 2 bits in order to access 16GB 
of memory.  Personally I'm not keen on doing this kind of trick, mainly 
because in GHC it would be a compile-time switch; Java has it easy here 
because they can make the choice at runtime.  Also we already use the 
low 2 bits of pointers in GHC for pointer tagging.

Cheers,
	Simon


More information about the Haskell-Cafe mailing list