ArrayArrays

Edward Kmett ekmett at gmail.com
Sat Aug 29 02:09:56 UTC 2015


I'd love to have that last 10%, but its a lot of work to get there and more
importantly I don't know quite what it should look like.

On the other hand, I do have a pretty good idea of how the primitives above
could be banged out and tested in a long evening, well in time for 7.12.
And as noted earlier, those remain useful even if a nicer typed version
with an extra level of indirection to the sizes is built up after.

The rest sounds like a good graduate student project for someone who has
graduate students lying around. Maybe somebody at Indiana University who
has an interest in type theory and parallelism can find us one. =)

-Edward

On Fri, Aug 28, 2015 at 8:48 PM, Ryan Yates <fryguybob at gmail.com> wrote:

> I think from my perspective, the motivation for getting the type
> checker involved is primarily bringing this to the level where users
> could be expected to build these structures.  it is reasonable to
> think that there are people who want to use STM (a context with
> mutation already) to implement a straight forward data structure that
> avoids extra indirection penalty.  There should be some places where
> knowing that things are field accesses rather then array indexing
> could be helpful, but I think GHC is good right now about handling
> constant offsets.  In my code I don't do any bounds checking as I know
> I will only be accessing my arrays with constant indexes.  I make
> wrappers for each field access and leave all the unsafe stuff in
> there.  When things go wrong though, the compiler is no help.  Maybe
> template Haskell that generates the appropriate wrappers is the right
> direction to go.
> There is another benefit for me when working with these as arrays in
> that it is quite simple and direct (given the hoops already jumped
> through) to play with alignment.  I can ensure two pointers are never
> on the same cache-line by just spacing things out in the array.
>
> On Fri, Aug 28, 2015 at 7:33 PM, Edward Kmett <ekmett at gmail.com> wrote:
> > They just segfault at this level. ;)
> >
> > Sent from my iPhone
> >
> > On Aug 28, 2015, at 7:25 PM, Ryan Newton <rrnewton at gmail.com> wrote:
> >
> > You presumably also save a bounds check on reads by hard-coding the
> sizes?
> >
> > On Fri, Aug 28, 2015 at 3:39 PM, Edward Kmett <ekmett at gmail.com> wrote:
> >>
> >> Also there are 4 different "things" here, basically depending on two
> >> independent questions:
> >>
> >> a.) if you want to shove the sizes into the info table, and
> >> b.) if you want cardmarking.
> >>
> >> Versions with/without cardmarking for different sizes can be done pretty
> >> easily, but as noted, the infotable variants are pretty invasive.
> >>
> >> -Edward
> >>
> >> On Fri, Aug 28, 2015 at 6:36 PM, Edward Kmett <ekmett at gmail.com> wrote:
> >>>
> >>> Well, on the plus side you'd save 16 bytes per object, which adds up if
> >>> they were small enough and there are enough of them. You get a bit
> better
> >>> locality of reference in terms of what fits in the first cache line of
> them.
> >>>
> >>> -Edward
> >>>
> >>> On Fri, Aug 28, 2015 at 6:14 PM, Ryan Newton <rrnewton at gmail.com>
> wrote:
> >>>>
> >>>> Yes. And for the short term I can imagine places we will settle with
> >>>> arrays even if it means tracking lengths unnecessarily and
> unsafeCoercing
> >>>> pointers whose types don't actually match their siblings.
> >>>>
> >>>> Is there anything to recommend the hacks mentioned for fixed sized
> array
> >>>> objects *other* than using them to fake structs? (Much to
> derecommend, as
> >>>> you mentioned!)
> >>>>
> >>>> On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett <ekmett at gmail.com>
> wrote:
> >>>>>
> >>>>> I think both are useful, but the one you suggest requires a lot more
> >>>>> plumbing and doesn't subsume all of the usecases of the other.
> >>>>>
> >>>>> -Edward
> >>>>>
> >>>>> On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton <rrnewton at gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> So that primitive is an array like thing (Same pointed type,
> unbounded
> >>>>>> length) with extra payload.
> >>>>>>
> >>>>>> I can see how we can do without structs if we have arrays,
> especially
> >>>>>> with the extra payload at front. But wouldn't the general solution
> for
> >>>>>> structs be one that that allows new user data type defs for # types?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett <ekmett at gmail.com>
> wrote:
> >>>>>>>
> >>>>>>> Some form of MutableStruct# with a known number of words and a
> known
> >>>>>>> number of pointers is basically what Ryan Yates was suggesting
> above, but
> >>>>>>> where the word counts were stored in the objects themselves.
> >>>>>>>
> >>>>>>> Given that it'd have a couple of words for those counts it'd likely
> >>>>>>> want to be something we build in addition to MutVar# rather than a
> >>>>>>> replacement.
> >>>>>>>
> >>>>>>> On the other hand, if we had to fix those numbers and build info
> >>>>>>> tables that knew them, and typechecker support, for instance, it'd
> get
> >>>>>>> rather invasive.
> >>>>>>>
> >>>>>>> Also, a number of things that we can do with the 'sized' versions
> >>>>>>> above, like working with evil unsized c-style arrays directly
> inline at the
> >>>>>>> end of the structure cease to be possible, so it isn't even a pure
> win if we
> >>>>>>> did the engineering effort.
> >>>>>>>
> >>>>>>> I think 90% of the needs I have are covered just by adding the one
> >>>>>>> primitive. The last 10% gets pretty invasive.
> >>>>>>>
> >>>>>>> -Edward
> >>>>>>>
> >>>>>>> On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton <rrnewton at gmail.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> I like the possibility of a general solution for mutable structs
> >>>>>>>> (like Ed said), and I'm trying to fully understand why it's hard.
> >>>>>>>>
> >>>>>>>> So, we can't unpack MutVar into constructors because of object
> >>>>>>>> identity problems. But what about directly supporting an
> extensible set of
> >>>>>>>> unlifted MutStruct# objects, generalizing (and even replacing)
> MutVar#? That
> >>>>>>>> may be too much work, but is it problematic otherwise?
> >>>>>>>>
> >>>>>>>> Needless to say, this is also critical if we ever want best in
> class
> >>>>>>>> lockfree mutable structures, just like their Stm and sequential
> >>>>>>>> counterparts.
> >>>>>>>>
> >>>>>>>> On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones
> >>>>>>>> <simonpj at microsoft.com> wrote:
> >>>>>>>>>
> >>>>>>>>> At the very least I'll take this email and turn it into a short
> >>>>>>>>> article.
> >>>>>>>>>
> >>>>>>>>> Yes, please do make it into a wiki page on the GHC Trac, and
> maybe
> >>>>>>>>> make a ticket for it.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Simon
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> From: Edward Kmett [mailto:ekmett at gmail.com]
> >>>>>>>>> Sent: 27 August 2015 16:54
> >>>>>>>>> To: Simon Peyton Jones
> >>>>>>>>> Cc: Manuel M T Chakravarty; Simon Marlow; ghc-devs
> >>>>>>>>> Subject: Re: ArrayArrays
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> An ArrayArray# is just an Array# with a modified invariant. It
> >>>>>>>>> points directly to other unlifted ArrayArray#'s or ByteArray#'s.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> While those live in #, they are garbage collected objects, so
> this
> >>>>>>>>> all lives on the heap.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> They were added to make some of the DPH stuff fast when it has to
> >>>>>>>>> deal with nested arrays.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I'm currently abusing them as a placeholder for a better thing.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> The Problem
> >>>>>>>>>
> >>>>>>>>> -----------------
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Consider the scenario where you write a classic doubly-linked
> list
> >>>>>>>>> in Haskell.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Chasing from one DLL to the next requires following 3 pointers on
> >>>>>>>>> the heap.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> DLL ~> IORef (Maybe DLL) ~> MutVar# RealWorld (Maybe DLL) ~>
> Maybe
> >>>>>>>>> DLL ~> DLL
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> That is 3 levels of indirection.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> We can trim one by simply unpacking the IORef with
> >>>>>>>>> -funbox-strict-fields or UNPACK
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> We can trim another by adding a 'Nil' constructor for DLL and
> >>>>>>>>> worsening our representation.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> but now we're still stuck with a level of indirection
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> DLL ~> MutVar# RealWorld DLL ~> DLL
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> This means that every operation we perform on this structure will
> >>>>>>>>> be about half of the speed of an implementation in most other
> languages
> >>>>>>>>> assuming we're memory bound on loading things into cache!
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Making Progress
> >>>>>>>>>
> >>>>>>>>> ----------------------
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I have been working on a number of data structures where the
> >>>>>>>>> indirection of going from something in * out to an object in #
> which
> >>>>>>>>> contains the real pointer to my target and coming back
> effectively doubles
> >>>>>>>>> my runtime.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> We go out to the MutVar# because we are allowed to put the
> MutVar#
> >>>>>>>>> onto the mutable list when we dirty it. There is a well defined
> >>>>>>>>> write-barrier.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I could change out the representation to use
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> data DLL = DLL (MutableArray# RealWorld DLL) | Nil
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I can just store two pointers in the MutableArray# every time,
> but
> >>>>>>>>> this doesn't help _much_ directly. It has reduced the amount of
> distinct
> >>>>>>>>> addresses in memory I touch on a walk of the DLL from 3 per
> object to 2.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I still have to go out to the heap from my DLL and get to the
> array
> >>>>>>>>> object and then chase it to the next DLL and chase that to the
> next array. I
> >>>>>>>>> do get my two pointers together in memory though. I'm paying for
> a card
> >>>>>>>>> marking table as well, which I don't particularly need with just
> two
> >>>>>>>>> pointers, but we can shed that with the "SmallMutableArray#"
> machinery added
> >>>>>>>>> back in 7.10, which is just the old array code a a new data
> type, which can
> >>>>>>>>> speed things up a bit when you don't have very big arrays:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> But what if I wanted my object itself to live in # and have two
> >>>>>>>>> mutable fields and be able to share the sme write barrier?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> An ArrayArray# points directly to other unlifted array types.
> What
> >>>>>>>>> if we have one # -> * wrapper on the outside to deal with the
> impedence
> >>>>>>>>> mismatch between the imperative world and Haskell, and then just
> let the
> >>>>>>>>> ArrayArray#'s hold other arrayarrays.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> data DLL = DLL (MutableArrayArray# RealWorld)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> now I need to make up a new Nil, which I can just make be a
> special
> >>>>>>>>> MutableArrayArray# I allocate on program startup. I can even
> abuse pattern
> >>>>>>>>> synonyms. Alternately I can exploit the internals further to
> make this
> >>>>>>>>> cheaper.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Then I can use the readMutableArrayArray# and
> >>>>>>>>> writeMutableArrayArray# calls to directly access the preceding
> and next
> >>>>>>>>> entry in the linked list.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> So now we have one DLL wrapper which just 'bootstraps me' into a
> >>>>>>>>> strict world, and everything there lives in #.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> next :: DLL -> IO DLL
> >>>>>>>>>
> >>>>>>>>> next (DLL m) = IO $ \s -> case readMutableArrayArray# s of
> >>>>>>>>>
> >>>>>>>>>    (# s', n #) -> (# s', DLL n #)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> It turns out GHC is quite happy to optimize all of that code to
> >>>>>>>>> keep things unboxed. The 'DLL' wrappers get removed pretty
> easily when they
> >>>>>>>>> are known strict and you chain operations of this sort!
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Cleaning it Up
> >>>>>>>>>
> >>>>>>>>> ------------------
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Now I have one outermost indirection pointing to an array that
> >>>>>>>>> points directly to other arrays.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I'm stuck paying for a card marking table per object, but I can
> fix
> >>>>>>>>> that by duplicating the code for MutableArrayArray# and using a
> >>>>>>>>> SmallMutableArray#. I can hack up primops that let me store a
> mixture of
> >>>>>>>>> SmallMutableArray# fields and normal ones in the data structure.
> >>>>>>>>> Operationally, I can even do so by just unsafeCoercing the
> existing
> >>>>>>>>> SmallMutableArray# primitives to change the kind of one of the
> arguments it
> >>>>>>>>> takes.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> This is almost ideal, but not quite. I often have fields that
> would
> >>>>>>>>> be best left unboxed.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> data DLLInt = DLL !Int !(IORef DLL) !(IORef DLL) | Nil
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> was able to unpack the Int, but we lost that. We can currently at
> >>>>>>>>> best point one of the entries of the SmallMutableArray# at a
> boxed or at a
> >>>>>>>>> MutableByteArray# for all of our misc. data and shove the int in
> question in
> >>>>>>>>> there.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> e.g. if I were to implement a hash-array-mapped-trie I need to
> >>>>>>>>> store masks and administrivia as I walk down the tree. Having to
> go off to
> >>>>>>>>> the side costs me the entire win from avoiding the first pointer
> chase.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> But, if like Ryan suggested, we had a heap object we could
> >>>>>>>>> construct that had n words with unsafe access and m pointers to
> other heap
> >>>>>>>>> objects, one that could put itself on the mutable list when any
> of those
> >>>>>>>>> pointers changed then I could shed this last factor of two in all
> >>>>>>>>> circumstances.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Prototype
> >>>>>>>>>
> >>>>>>>>> -------------
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Over the last few days I've put together a small prototype
> >>>>>>>>> implementation with a few non-trivial imperative data structures
> for things
> >>>>>>>>> like Tarjan's link-cut trees, the list labeling problem and
> >>>>>>>>> order-maintenance.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> https://github.com/ekmett/structs
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Notable bits:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Data.Struct.Internal.LinkCut provides an implementation of
> link-cut
> >>>>>>>>> trees in this style.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Data.Struct.Internal provides the rather horrifying guts that
> make
> >>>>>>>>> it go fast.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Once compiled with -O or -O2, if you look at the core, almost all
> >>>>>>>>> the references to the LinkCut or Object data constructor get
> optimized away,
> >>>>>>>>> and we're left with beautiful strict code directly mutating out
> underlying
> >>>>>>>>> representation.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> At the very least I'll take this email and turn it into a short
> >>>>>>>>> article.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -Edward
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Thu, Aug 27, 2015 at 9:00 AM, Simon Peyton Jones
> >>>>>>>>> <simonpj at microsoft.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Just to say that I have no idea what is going on in this thread.
> >>>>>>>>> What is ArrayArray?  What is the issue in general?  Is there a
> ticket? Is
> >>>>>>>>> there a wiki page?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> If it’s important, an ab-initio wiki page + ticket would be a
> good
> >>>>>>>>> thing.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Simon
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf
> Of
> >>>>>>>>> Edward Kmett
> >>>>>>>>> Sent: 21 August 2015 05:25
> >>>>>>>>> To: Manuel M T Chakravarty
> >>>>>>>>> Cc: Simon Marlow; ghc-devs
> >>>>>>>>> Subject: Re: ArrayArrays
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> When (ab)using them for this purpose, SmallArrayArray's would be
> >>>>>>>>> very handy as well.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Consider right now if I have something like an order-maintenance
> >>>>>>>>> structure I have:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> data Upper s = Upper {-# UNPACK #-} !(MutableByteArray s) {-#
> >>>>>>>>> UNPACK #-} !(MutVar s (Upper s)) {-# UNPACK #-} !(MutVar s
> (Upper s))
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> data Lower s = Lower {-# UNPACK #-} !(MutVar s (Upper s)) {-#
> >>>>>>>>> UNPACK #-} !(MutableByteArray s) {-# UNPACK #-} !(MutVar s
> (Lower s)) {-#
> >>>>>>>>> UNPACK #-} !(MutVar s (Lower s))
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> The former contains, logically, a mutable integer and two
> pointers,
> >>>>>>>>> one for forward and one for backwards. The latter is basically
> the same
> >>>>>>>>> thing with a mutable reference up pointing at the structure
> above.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On the heap this is an object that points to a structure for the
> >>>>>>>>> bytearray, and points to another structure for each mutvar which
> each point
> >>>>>>>>> to the other 'Upper' structure. So there is a level of
> indirection smeared
> >>>>>>>>> over everything.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> So this is a pair of doubly linked lists with an upward link from
> >>>>>>>>> the structure below to the structure above.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Converted into ArrayArray#s I'd get
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> data Upper s = Upper (MutableArrayArray# s)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> w/ the first slot being a pointer to a MutableByteArray#, and the
> >>>>>>>>> next 2 slots pointing to the previous and next previous objects,
> represented
> >>>>>>>>> just as their MutableArrayArray#s. I can use
> sameMutableArrayArray# on these
> >>>>>>>>> for object identity, which lets me check for the ends of the
> lists by tying
> >>>>>>>>> things back on themselves.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> and below that
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> data Lower s = Lower (MutableArrayArray# s)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> is similar, with an extra MutableArrayArray slot pointing up to
> an
> >>>>>>>>> upper structure.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I can then write a handful of combinators for getting out the
> slots
> >>>>>>>>> in question, while it has gained a level of indirection between
> the wrapper
> >>>>>>>>> to put it in * and the MutableArrayArray# s in #, that one can
> be basically
> >>>>>>>>> erased by ghc.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Unlike before I don't have several separate objects on the heap
> for
> >>>>>>>>> each thing. I only have 2 now. The MutableArrayArray# for the
> object itself,
> >>>>>>>>> and the MutableByteArray# that it references to carry around the
> mutable
> >>>>>>>>> int.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> The only pain points are
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> 1.) the aforementioned limitation that currently prevents me from
> >>>>>>>>> stuffing normal boxed data through a SmallArray or Array into an
> ArrayArray
> >>>>>>>>> leaving me in a little ghetto disconnected from the rest of
> Haskell,
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> and
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> 2.) the lack of SmallArrayArray's, which could let us avoid the
> >>>>>>>>> card marking overhead. These objects are all small, 3-4 pointers
> wide. Card
> >>>>>>>>> marking doesn't help.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Alternately I could just try to do really evil things and convert
> >>>>>>>>> the whole mess to SmallArrays and then figure out how to
> unsafeCoerce my way
> >>>>>>>>> to glory, stuffing the #'d references to the other arrays
> directly into the
> >>>>>>>>> SmallArray as slots, removing the limitation  we see here by
> aping the
> >>>>>>>>> MutableArrayArray# s API, but that gets really really dangerous!
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I'm pretty much willing to sacrifice almost anything on the altar
> >>>>>>>>> of speed here, but I'd like to be able to let the GC move them
> and collect
> >>>>>>>>> them which rules out simpler Ptr and Addr based solutions.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -Edward
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Thu, Aug 20, 2015 at 9:01 PM, Manuel M T Chakravarty
> >>>>>>>>> <chak at cse.unsw.edu.au> wrote:
> >>>>>>>>>
> >>>>>>>>> That’s an interesting idea.
> >>>>>>>>>
> >>>>>>>>> Manuel
> >>>>>>>>>
> >>>>>>>>> > Edward Kmett <ekmett at gmail.com>:
> >>>>>>>>>
> >>>>>>>>> >
> >>>>>>>>> > Would it be possible to add unsafe primops to add Array# and
> >>>>>>>>> > SmallArray# entries to an ArrayArray#? The fact that the
> ArrayArray# entries
> >>>>>>>>> > are all directly unlifted avoiding a level of indirection for
> the containing
> >>>>>>>>> > structure is amazing, but I can only currently use it if my
> leaf level data
> >>>>>>>>> > can be 100% unboxed and distributed among ByteArray#s. It'd be
> nice to be
> >>>>>>>>> > able to have the ability to put SmallArray# a stuff down at
> the leaves to
> >>>>>>>>> > hold lifted contents.
> >>>>>>>>> >
> >>>>>>>>> > I accept fully that if I name the wrong type when I go to
> access
> >>>>>>>>> > one of the fields it'll lie to me, but I suppose it'd do that
> if i tried to
> >>>>>>>>> > use one of the members that held a nested ArrayArray# as a
> ByteArray#
> >>>>>>>>> > anyways, so it isn't like there is a safety story preventing
> this.
> >>>>>>>>> >
> >>>>>>>>> > I've been hunting for ways to try to kill the indirection
> >>>>>>>>> > problems I get with Haskell and mutable structures, and I
> could shoehorn a
> >>>>>>>>> > number of them into ArrayArrays if this worked.
> >>>>>>>>> >
> >>>>>>>>> > Right now I'm stuck paying for 2 or 3 levels of unnecessary
> >>>>>>>>> > indirection compared to c/java and this could reduce that pain
> to just 1
> >>>>>>>>> > level of unnecessary indirection.
> >>>>>>>>> >
> >>>>>>>>> > -Edward
> >>>>>>>>>
> >>>>>>>>> > _______________________________________________
> >>>>>>>>> > ghc-devs mailing list
> >>>>>>>>> > ghc-devs at haskell.org
> >>>>>>>>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> ghc-devs mailing list
> >>>>>>>>> ghc-devs at haskell.org
> >>>>>>>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> >
> >
> > _______________________________________________
> > ghc-devs mailing list
> > ghc-devs at haskell.org
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20150828/f56df92e/attachment-0001.html>


More information about the ghc-devs mailing list