ArrayArrays

Edward Z. Yang ezyang at mit.edu
Thu Aug 27 17:24:12 UTC 2015


It seems to me that we should take a page from OCaml's playbook
and add support for native mutable fields in objects, because
this is essentially what a mix of words and pointers is.

The big question, as always, is what the syntax should be.

Edward

Excerpts from Ryan Yates's message of 2015-08-21 06:49:47 -0700:
> Hi Edward,
> 
> I've been working on removing indirection in STM and I added a heap
> object like SmallArray, but with a mix of words and pointers (as well
> as a header with metadata for STM).  It appears to work well now, but
> it is missing the type information.  All the pointers have the same
> type which works fine for your Upper.  In my case I use it to
> represent a red-black tree node [1].
> 
> Also all the structures I make are fixed size and it would be nice if
> the compiler could treat that fix size like a constant in code
> generation.  I don't know what the right design is or what would be
> needed, but it seems simple enough to give the right typing
> information to something like this and basically get a mutable struct.
> I'm talking about this work at HIW and really hope to find someone
> interested in extending this expressiveness to let us write something
> that looks clear in Haskell, but gives the heap representation that we
> really need for performance.  From the RTS perspective I think there
> are any obstacles.
> 
> [1]: https://github.com/fryguybob/ghc-stm-benchmarks/blob/master/benchmarks/RBTree-Throughput/RBTreeNode.hs
> 
> Ryan
> 
> On Fri, Aug 21, 2015 at 12:25 AM, Edward Kmett <ekmett at gmail.com> wrote:
> > When (ab)using them for this purpose, SmallArrayArray's would be very handy
> > as well.
> >
> > Consider right now if I have something like an order-maintenance structure I
> > have:
> >
> > data Upper s = Upper {-# UNPACK #-} !(MutableByteArray s) {-# UNPACK #-}
> > !(MutVar s (Upper s)) {-# UNPACK #-} !(MutVar s (Upper s))
> >
> > data Lower s = Lower {-# UNPACK #-} !(MutVar s (Upper s)) {-# UNPACK #-}
> > !(MutableByteArray s) {-# UNPACK #-} !(MutVar s (Lower s)) {-# UNPACK #-}
> > !(MutVar s (Lower s))
> >
> > The former contains, logically, a mutable integer and two pointers, one for
> > forward and one for backwards. The latter is basically the same thing with a
> > mutable reference up pointing at the structure above.
> >
> > On the heap this is an object that points to a structure for the bytearray,
> > and points to another structure for each mutvar which each point to the
> > other 'Upper' structure. So there is a level of indirection smeared over
> > everything.
> >
> > So this is a pair of doubly linked lists with an upward link from the
> > structure below to the structure above.
> >
> > Converted into ArrayArray#s I'd get
> >
> > data Upper s = Upper (MutableArrayArray# s)
> >
> > w/ the first slot being a pointer to a MutableByteArray#, and the next 2
> > slots pointing to the previous and next previous objects, represented just
> > as their MutableArrayArray#s. I can use sameMutableArrayArray# on these for
> > object identity, which lets me check for the ends of the lists by tying
> > things back on themselves.
> >
> > and below that
> >
> > data Lower s = Lower (MutableArrayArray# s)
> >
> > is similar, with an extra MutableArrayArray slot pointing up to an upper
> > structure.
> >
> > I can then write a handful of combinators for getting out the slots in
> > question, while it has gained a level of indirection between the wrapper to
> > put it in * and the MutableArrayArray# s in #, that one can be basically
> > erased by ghc.
> >
> > Unlike before I don't have several separate objects on the heap for each
> > thing. I only have 2 now. The MutableArrayArray# for the object itself, and
> > the MutableByteArray# that it references to carry around the mutable int.
> >
> > The only pain points are
> >
> > 1.) the aforementioned limitation that currently prevents me from stuffing
> > normal boxed data through a SmallArray or Array into an ArrayArray leaving
> > me in a little ghetto disconnected from the rest of Haskell,
> >
> > and
> >
> > 2.) the lack of SmallArrayArray's, which could let us avoid the card marking
> > overhead. These objects are all small, 3-4 pointers wide. Card marking
> > doesn't help.
> >
> > Alternately I could just try to do really evil things and convert the whole
> > mess to SmallArrays and then figure out how to unsafeCoerce my way to glory,
> > stuffing the #'d references to the other arrays directly into the SmallArray
> > as slots, removing the limitation  we see here by aping the
> > MutableArrayArray# s API, but that gets really really dangerous!
> >
> > I'm pretty much willing to sacrifice almost anything on the altar of speed
> > here, but I'd like to be able to let the GC move them and collect them which
> > rules out simpler Ptr and Addr based solutions.
> >
> > -Edward
> >
> > On Thu, Aug 20, 2015 at 9:01 PM, Manuel M T Chakravarty
> > <chak at cse.unsw.edu.au> wrote:
> >>
> >> That’s an interesting idea.
> >>
> >> Manuel
> >>
> >> > Edward Kmett <ekmett at gmail.com>:
> >> >
> >> > Would it be possible to add unsafe primops to add Array# and SmallArray#
> >> > entries to an ArrayArray#? The fact that the ArrayArray# entries are all
> >> > directly unlifted avoiding a level of indirection for the containing
> >> > structure is amazing, but I can only currently use it if my leaf level data
> >> > can be 100% unboxed and distributed among ByteArray#s. It'd be nice to be
> >> > able to have the ability to put SmallArray# a stuff down at the leaves to
> >> > hold lifted contents.
> >> >
> >> > I accept fully that if I name the wrong type when I go to access one of
> >> > the fields it'll lie to me, but I suppose it'd do that if i tried to use one
> >> > of the members that held a nested ArrayArray# as a ByteArray# anyways, so it
> >> > isn't like there is a safety story preventing this.
> >> >
> >> > I've been hunting for ways to try to kill the indirection problems I get
> >> > with Haskell and mutable structures, and I could shoehorn a number of them
> >> > into ArrayArrays if this worked.
> >> >
> >> > Right now I'm stuck paying for 2 or 3 levels of unnecessary indirection
> >> > compared to c/java and this could reduce that pain to just 1 level of
> >> > unnecessary indirection.
> >> >
> >> > -Edward
> >> > _______________________________________________
> >> > ghc-devs mailing list
> >> > ghc-devs at haskell.org
> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
> >>
> >
> >
> > _______________________________________________
> > ghc-devs mailing list
> > ghc-devs at haskell.org
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
> >


More information about the ghc-devs mailing list