[Haskell-cafe] Garbage collecting pointers

Sat Mar 27 00:15:57 EDT 2010

I have to Point out that any such scheme as is being described would
need to be done quite carefully as to not break pass by reference data
semantics that Haskell enjoys/ the wealth of sharing

Moreover, as I understand it, something like this only is feasible in
general for statically sized data structures (though there has been
research on sml nj in the 90s that I thnk was  in the phd thesis of
zhong shao I believe that detailed an evaluation of an approach to
chunking cons cells of lists into contiguous array like chunks), and
that in general any approach to this sort of cleverness would require
a lot more dictionary passing going on than Haskell already deals
with.

Additionally, Im fairly certain both that any non heuristic approach
to such optimization would quickly hit some computational
intractibility ala knapsack and also play merry hell with satisfying
memory alignment constaints required by some ABIs and encouraged by
some processors.

On Friday, March 26, 2010, Sebastian Sylvan <sebastian.sylvan at gmail.com> wrote:
>
>
> On Fri, Mar 26, 2010 at 10:52 PM, Mads Lindstrøm <mads_lindstroem at yahoo.dk> wrote:
>
> Hi
>
> On Fri, 2010-03-26 at 21:33 +0000, Sebastian Sylvan wrote:
>
>>
>>
>> Reorganizing data on the fly sounds like it may be a pretty sensible
>> idea now that cache misses are so bad (in comparison). The fact that
>> Haskell data is generally immutable helps too.
>> However, I think your scheme sounds a bit complicated for my tastes,
>> and I don't really understand the idea of "lengths", what would "n
>> consecutive elements" mean for a tree? Surely this wouldn't work just
>> for list-like data structures?
>
> First of all, I had assumed that a data type's memory footprint was
> independent of its value. Similarly to how a C-union always occupies the
> same amount of memory. After reading your post, I reevaluated my
> assumption and my assumption might just be wrong.
>
> Take the Maybe data type. The Nothing value occupies no space other than the Nothing-tag, the Just part occupies more (potentially a lot more, if it too has been collapsed!). It's conceivable that you would force the size to be the same a la C' unions, and have some thresholding where if one of the possible alternatives is too big then it would always use a pointer for that one alternative, to minimize padding.
>
> What do "heavily quantized lookup table" mean? It is the "heavily
> quantized" part I do not get.
>
> E.g. store a small number of bits of offset information per field in a record at the top, let's say 8 bits per field. Now, that means you can have 256 unique values, and therefore fields, but it nothing says that the offset value needs to refer to "bytes", it could just as easily refer to "multiples of 16 bytes" for larger structures. This means you would need to align all your fields to 16-byte boundaries, which may need padding, but lets you refer to fields with a larger offset without using a full pointer per field.
>
> Maybe your gain is wasted by doing too much logic here. Perhaps a more conservative suggestion would be to keep the pointers, so the runtime code is the same, but store a little tag indicating that a value has an optional memory extent for the purposes of GC. This way you could compact the leaves of a data graph (still keeping the pointers), which lets the GC stop following pointers once it hits the tag and reads the memory extent saying "this is a some data totalling 612 bytes, and I promise that any pointers in this memory block are all internal to the block". You'd reduce GC work, and more importantly cache misses, but not overall memory usage.
>
> --
> Sebastian Sylvan
>