[Haskell-cafe] Re: Garbage collecting pointers

Mon Mar 29 12:36:45 EDT 2010

On Mon, Mar 29, 2010 at 12:00 PM, Simon Marlow <marlowsd at gmail.com> wrote:

> On 26/03/2010 20:28, Mads Lindstrøm wrote:
>
>> Hi
>>
>> For some time I have been thinking about an idea, which could limit
>> Haskell's memory footprint. I don't know if the idea is crazy or clever,
>> but I would love to hear peoples thoughts about it. The short story is,
>> I propose that the garbage collector should not just reclaim unused
>> memory, it should also diminish the need for pointers, by replacing
>> nested data structures with larger chunks of consecutive memory. In
>> other words, I would diminish the need for pointers for arbitrary
>> recursive data types, just as replacing linked lists with arrays
>> eliminate the need for pointers.
>>
>> I will explain my idea by an example of a data type we all know and
>> love:
>>
>> data List a = Cons a (List a) | Nil
>>
>> each Cons cell uses two pointers - one for the element and one for the
>> rest of the list. If we could somehow merge the element and the rest of
>> the list into consecutive memory, we would be able to eliminate the
>> pointer to the rest of list. On 64 bit architectures merging would save
>> us 8 bytes of "wasted" memory. If we could merge n elements together we
>> could save n*8 bytes of memory.
>>
>
> The trouble with techniques like this is that they break the uniformity of
> the representation, and complexity leaks all over the place.  Various
> similar ideas have been tried in the past, though not with Haskell as far as
> I'm aware: CDR-coding and BiBOP spring to mind.
>

While CDR-coding can introduce one hell of a complexity explosion if you're
not careful, using an (optional) BiBOP-like representation for the tag isn't
that bad. I've been experimenting with it in a lightweight Haskell-like
language.

If you associate with each page an optional tag for that page, and then when
looking up the tag first check the page level. That way you can use a bump
allocator initially, and as tags benchmark as being more common during
garbage collection, you can dedicate pages to them. With GHC-style pointer
tagging you rarely look at the actual tag anyways, so this extra cost is
only incurred when forcing the thunk initially, or when the tag can't be
applied (too many constructors).

 64 bit pointers are wasteful in another way, as nobody has anywhere near
>> 2^64 bytes of memory. And nobody is going to have that much memory for
>> decades to come. Thus, we could use the 8 most significant bits for
>> something besides pointing to memory, without loosing any significant
>> ability to address our memories. This would be similar to how GHC uses
>> some of the least significant bits for pointer tagging.
>>
>
> Unfortunatley you don't know whereabouts in your address space your memory
> is located: the OS could do randomised allocation and give you a pages all
> over the place, so in fact you might need all 64 bits.  Yes you can start
> doing OS-specific things, but that always leads to pain later (I know, I've
> been there).
>
> In the Java community there has been experimentation with using different
> representations to avoid the 64-bit pointer overhead: e.g. shifting a 32-bit
> pointer to the left by 2 bits in order to access 16GB of memory.  Personally
> I'm not keen on doing this kind of trick, mainly because in GHC it would be
> a compile-time switch; Java has it easy here because they can make the
> choice at runtime.  Also we already use the low 2 bits of pointers in GHC
> for pointer tagging.
>

The closest I've been able to get to making this scheme work is to use the
MSB of the pointer as an 'evaluated' bit, and using 16 byte aligned
'CompressedOOPs' checking evaluation by something like:

add eax, eax -- set carry based on evaluation flag, and double the base
pointer
jc evaluated
-- otherwise fetch values using base+[eax*8]+slot

Of course, knowing that something is evaluated is far less useful than
knowing its actual tag, but the language in question lacks some of the
niceties of Haskell with regards to optimization potential here. Though it
does give access to a nice 32 gig heap at an arbitrary base address with 16
byte alignment, which makes it suitable to use SSE opcodes to manipulate
thunks.

The biggest problem with CompressedOOPs, for me, is the lack of linker
support.

You can ask for a segment to be given a chunk of memory below the 2 gig
marker, but not below 4, let alone 32, so 0-based compressed OOPs can't be
linked by the native linker. It wasn't until they added 0-based compressed
OOPs that things finally started to perform for Java, and bringing your own
linker into the mix is a decidedly unpleasant option. I'm continuing to play
with them in a JIT-only environment but they don't play nice with
compilation.

-Edward Kmett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100329/49b283f7/attachment.html