[jhc] Re: [Haskell] ANNOUNCE: jhc 0.7.4

Mon Jul 26 20:54:04 EDT 2010

On Tue, Jul 27, 2010 at 12:42:27AM +0100, Thomas Schilling wrote:
> >  * a garbage colletor which can be enabled with '-fjgc' is available.
> >   after some more testing, I will make it the default in the 0.7.5 release.
> >  * A new memory allocator based on Jeff Bonwick's slab memory allocator for
> >   kernels. This means that in many cases, memory can be pulled off a slab
> >   and immediately used with no initialization, in addition, it allowed a
> >   rearrangement of the GC meta-data to optimizally use the cache.
> 
> That sounds like you're essentially using BIBOP (big bag of pages,
> http://www.memorymanagement.org/glossary/b.html#bibop).  Did you
> compare the overheads with the header-based scheme?  Do you still need
> a tag/header word or do you get the tag by looking up the per page
> info?  Do you use this scheme for all objects or only for small
> objects?  I could imagine a hybrid scheme where where you use bibop
> for, say, single-constructor objects, although I wonder how you would
> deal with thunks and their update.

Yeah, it is somewhat similar to that, except I don't need to look up in
a table what the type is, I just take the address modulo the block size
and there is a shared info section at the front of the block. all the GC
bookeeping bits are kept here too and can usually fit on a single cache
line, that way you can mark a object used without actually having to
dereference a pointer to it, allowing the cache line for the object
itself to fill hopefully by the time I get around to it on the grey
list. objects that contain no pointers are never touched by the GC, an
advantage of keeping the bookeeping bits away from the objects.

There is no per object overhead/tag word imposed by the storage manager,
other than the one or two bits used by the GC. (one bit right now, but I
will need to expand it to add some GC features that are missing
(destructors, incremential collection)).

I currently use this scheme for all objects, however, beyond a certain
size I think a header approach might be better, so I will look into
that. In particluar, my arrary sizes are limited to what can fit in a
block right now when you have the GC on, which is a limitation I need to
fix.

I did experiment with a header scheme, however, it was never really
optimized that well so I don't think I could make performance
conclusions from it. I may add a switch to control this though.
Honestly, a slab inspired allocator has been my plan since day one of
writing jhc, I had a fair amount of experience with them from my
previous OS work and felt they would be a good fit for jhcs model.

> >  * packed representation of algebraic types, 'Maybe Foo' is actually
> >   represented in memory as a NULL pointer or just a 'Foo' with no tag bits.
> >   Combined with the slab allocator, this can double the number of some common
> >   values that can be put in a cache line.
> 
> Interesting.  How do you distinguish  'Just e' from 'Just _|_'?  Do
> you need the whole program assumption to disprove that the latter case
> can happen?

each word has two tag bits, one says whether the value is in WHNF, the
other says whether it should be followed by the GC. So to eval
something, I check the tag bit, if it says WHNF I am done, otherwise the
object must be one of a thunk or an indirection, if it is a thunk, it
has a code pointer as its first entry, otherwise it is a redirection
that _must_ be a WHNF pointer.

Unlike ghc, there are strong invarients about what sort of objects can
appear where, WHNF and non WHNF values actually have different types in
Grin and C. For instance, an indirection to an indirection is
impossible, as is redirecting a whnf value, this also means a lot of
things that are easy for ghc, a collector that moves things around
incrementally, implementing concurrent locks by overwriting an
evaluation pointer, etc are difficult in jhc. So there are tradeoffs. (a
selectable alternate RTS may mitigate this though)

Now, the interesting thing is that if the WHNF bit is set, there are no
rts or gc imposed restrictions on the word whatsoever, it need not even
be a pointer, and often isn't. (unless the gc flag is set, in which case
it has to be a heap pointer, but the format of what it is pointing to is
completly undefined). For instance, data Foo !Word8 !Word8 will
theoretically be packed directly into the word itself as the two most
signifigant bytes.

For each type I can statically generate an "optimal" layout based on its
structure. For instance, maybe benefits from two of these optimizations,
first of all, nullary constructors (Nothing) need never appear in the
heap, so they are given values that pack directly into a word, this
happens for all nullay constructors in general. Then, since there is
only one constructor left (Just) we can discard the tag field, because
if something of type Maybe, is in WHNF, and is not Nothing then in must be
a Just. so the Just is a single heap allocated word (which are packed
end to end in a page with no overhead)

I am refining the optimization algorithm, on 64 bit machines I have
another few bits to play with for instance that I can take advantage of.

A particularly nice case is that characters can be fully integreated
into the word, so the entire space usage of 

"Hello" is 10 words as lists benefit from the same packing benefits as
Maybe.

compare this to 30 or so words used in a traditional "everything is on
the heap and tagged" model.

The manual has a section describing the RTS, it doesn't describe the GC
though, but talks about how I pack things in the pointers.

http://repetae.net/computer/jhc/manual.html#the-run-time-system

        John

-- 
John Meacham - ⑆repetae.net⑆john⑈ - http://notanumber.net/