From johan.tibell at gmail.com Tue Sep 1 05:14:22 2015 From: johan.tibell at gmail.com (Johan Tibell) Date: Mon, 31 Aug 2015 22:14:22 -0700 Subject: ArrayArrays In-Reply-To: References: <4DACFC45-0E7E-4B3F-8435-5365EC3F7749@cse.unsw.edu.au> <65158505c7be41afad85374d246b7350@DB4PR30MB030.064d.mgd.msft.net> <2FCB6298-A4FF-4F7B-8BF8-4880BB3154AB@gmail.com> Message-ID: Works for me. On Mon, Aug 31, 2015 at 3:50 PM, Ryan Yates wrote: > Any time works for me. > > Ryan > > On Mon, Aug 31, 2015 at 6:11 PM, Ryan Newton wrote: > > Dear Edward, Ryan Yates, and other interested parties -- > > > > So when should we meet up about this? > > > > May I propose the Tues afternoon break for everyone at ICFP who is > > interested in this topic? We can meet out in the coffee area and > congregate > > around Edward Kmett, who is tall and should be easy to find ;-). > > > > I think Ryan is going to show us how to use his new primops for combined > > array + other fields in one heap object? > > > > On Sat, Aug 29, 2015 at 9:24 PM Edward Kmett wrote: > >> > >> Without a custom primitive it doesn't help much there, you have to store > >> the indirection to the mask. > >> > >> With a custom primitive it should cut the on heap root-to-leaf path of > >> everything in the HAMT in half. A shorter HashMap was actually one of > the > >> motivating factors for me doing this. It is rather astoundingly > difficult to > >> beat the performance of HashMap, so I had to start cheating pretty > badly. ;) > >> > >> -Edward > >> > >> On Sat, Aug 29, 2015 at 5:45 PM, Johan Tibell > >> wrote: > >>> > >>> I'd also be interested to chat at ICFP to see if I can use this for my > >>> HAMT implementation. > >>> > >>> On Sat, Aug 29, 2015 at 3:07 PM, Edward Kmett > wrote: > >>>> > >>>> Sounds good to me. Right now I'm just hacking up composable accessors > >>>> for "typed slots" in a fairly lens-like fashion, and treating the set > of > >>>> slots I define and the 'new' function I build for the data type as > its API, > >>>> and build atop that. This could eventually graduate to > template-haskell, but > >>>> I'm not entirely satisfied with the solution I have. I currently > distinguish > >>>> between what I'm calling "slots" (things that point directly to > another > >>>> SmallMutableArrayArray# sans wrapper) and "fields" which point > directly to > >>>> the usual Haskell data types because unifying the two notions meant > that I > >>>> couldn't lift some coercions out "far enough" to make them vanish. > >>>> > >>>> I'll be happy to run through my current working set of issues in > person > >>>> and -- as things get nailed down further -- in a longer lived medium > than in > >>>> personal conversations. ;) > >>>> > >>>> -Edward > >>>> > >>>> On Sat, Aug 29, 2015 at 7:59 AM, Ryan Newton > wrote: > >>>>> > >>>>> I'd also love to meet up at ICFP and discuss this. I think the array > >>>>> primops plus a TH layer that lets (ab)use them many times without > too much > >>>>> marginal cost sounds great. And I'd like to learn how we could be > either > >>>>> early users of, or help with, this infrastructure. > >>>>> > >>>>> CC'ing in Ryan Scot and Omer Agacan who may also be interested in > >>>>> dropping in on such discussions @ICFP, and Chao-Hong Chen, a Ph.D. > student > >>>>> who is currently working on concurrent data structures in Haskell, > but will > >>>>> not be at ICFP. > >>>>> > >>>>> > >>>>> On Fri, Aug 28, 2015 at 7:47 PM, Ryan Yates > >>>>> wrote: > >>>>>> > >>>>>> I completely agree. I would love to spend some time during ICFP and > >>>>>> friends talking about what it could look like. My small array for > STM > >>>>>> changes for the RTS can be seen here [1]. It is on a branch > somewhere > >>>>>> between 7.8 and 7.10 and includes irrelevant STM bits and some > >>>>>> confusing naming choices (sorry), but should cover all the details > >>>>>> needed to implement it for a non-STM context. The biggest surprise > >>>>>> for me was following small array too closely and having a word/byte > >>>>>> offset miss-match [2]. > >>>>>> > >>>>>> [1]: > >>>>>> > https://github.com/fryguybob/ghc/compare/ghc-htm-bloom...fryguybob:ghc-htm-mut > >>>>>> [2]: https://ghc.haskell.org/trac/ghc/ticket/10413 > >>>>>> > >>>>>> Ryan > >>>>>> > >>>>>> On Fri, Aug 28, 2015 at 10:09 PM, Edward Kmett > >>>>>> wrote: > >>>>>> > I'd love to have that last 10%, but its a lot of work to get there > >>>>>> > and more > >>>>>> > importantly I don't know quite what it should look like. > >>>>>> > > >>>>>> > On the other hand, I do have a pretty good idea of how the > >>>>>> > primitives above > >>>>>> > could be banged out and tested in a long evening, well in time for > >>>>>> > 7.12. And > >>>>>> > as noted earlier, those remain useful even if a nicer typed > version > >>>>>> > with an > >>>>>> > extra level of indirection to the sizes is built up after. > >>>>>> > > >>>>>> > The rest sounds like a good graduate student project for someone > who > >>>>>> > has > >>>>>> > graduate students lying around. Maybe somebody at Indiana > University > >>>>>> > who has > >>>>>> > an interest in type theory and parallelism can find us one. =) > >>>>>> > > >>>>>> > -Edward > >>>>>> > > >>>>>> > On Fri, Aug 28, 2015 at 8:48 PM, Ryan Yates > >>>>>> > wrote: > >>>>>> >> > >>>>>> >> I think from my perspective, the motivation for getting the type > >>>>>> >> checker involved is primarily bringing this to the level where > >>>>>> >> users > >>>>>> >> could be expected to build these structures. it is reasonable to > >>>>>> >> think that there are people who want to use STM (a context with > >>>>>> >> mutation already) to implement a straight forward data structure > >>>>>> >> that > >>>>>> >> avoids extra indirection penalty. There should be some places > >>>>>> >> where > >>>>>> >> knowing that things are field accesses rather then array indexing > >>>>>> >> could be helpful, but I think GHC is good right now about > handling > >>>>>> >> constant offsets. In my code I don't do any bounds checking as I > >>>>>> >> know > >>>>>> >> I will only be accessing my arrays with constant indexes. I make > >>>>>> >> wrappers for each field access and leave all the unsafe stuff in > >>>>>> >> there. When things go wrong though, the compiler is no help. > >>>>>> >> Maybe > >>>>>> >> template Haskell that generates the appropriate wrappers is the > >>>>>> >> right > >>>>>> >> direction to go. > >>>>>> >> There is another benefit for me when working with these as arrays > >>>>>> >> in > >>>>>> >> that it is quite simple and direct (given the hoops already > jumped > >>>>>> >> through) to play with alignment. I can ensure two pointers are > >>>>>> >> never > >>>>>> >> on the same cache-line by just spacing things out in the array. > >>>>>> >> > >>>>>> >> On Fri, Aug 28, 2015 at 7:33 PM, Edward Kmett > >>>>>> >> wrote: > >>>>>> >> > They just segfault at this level. ;) > >>>>>> >> > > >>>>>> >> > Sent from my iPhone > >>>>>> >> > > >>>>>> >> > On Aug 28, 2015, at 7:25 PM, Ryan Newton > >>>>>> >> > wrote: > >>>>>> >> > > >>>>>> >> > You presumably also save a bounds check on reads by hard-coding > >>>>>> >> > the > >>>>>> >> > sizes? > >>>>>> >> > > >>>>>> >> > On Fri, Aug 28, 2015 at 3:39 PM, Edward Kmett < > ekmett at gmail.com> > >>>>>> >> > wrote: > >>>>>> >> >> > >>>>>> >> >> Also there are 4 different "things" here, basically depending > on > >>>>>> >> >> two > >>>>>> >> >> independent questions: > >>>>>> >> >> > >>>>>> >> >> a.) if you want to shove the sizes into the info table, and > >>>>>> >> >> b.) if you want cardmarking. > >>>>>> >> >> > >>>>>> >> >> Versions with/without cardmarking for different sizes can be > >>>>>> >> >> done > >>>>>> >> >> pretty > >>>>>> >> >> easily, but as noted, the infotable variants are pretty > >>>>>> >> >> invasive. > >>>>>> >> >> > >>>>>> >> >> -Edward > >>>>>> >> >> > >>>>>> >> >> On Fri, Aug 28, 2015 at 6:36 PM, Edward Kmett < > ekmett at gmail.com> > >>>>>> >> >> wrote: > >>>>>> >> >>> > >>>>>> >> >>> Well, on the plus side you'd save 16 bytes per object, which > >>>>>> >> >>> adds up > >>>>>> >> >>> if > >>>>>> >> >>> they were small enough and there are enough of them. You get > a > >>>>>> >> >>> bit > >>>>>> >> >>> better > >>>>>> >> >>> locality of reference in terms of what fits in the first > cache > >>>>>> >> >>> line of > >>>>>> >> >>> them. > >>>>>> >> >>> > >>>>>> >> >>> -Edward > >>>>>> >> >>> > >>>>>> >> >>> On Fri, Aug 28, 2015 at 6:14 PM, Ryan Newton > >>>>>> >> >>> > >>>>>> >> >>> wrote: > >>>>>> >> >>>> > >>>>>> >> >>>> Yes. And for the short term I can imagine places we will > >>>>>> >> >>>> settle with > >>>>>> >> >>>> arrays even if it means tracking lengths unnecessarily and > >>>>>> >> >>>> unsafeCoercing > >>>>>> >> >>>> pointers whose types don't actually match their siblings. > >>>>>> >> >>>> > >>>>>> >> >>>> Is there anything to recommend the hacks mentioned for fixed > >>>>>> >> >>>> sized > >>>>>> >> >>>> array > >>>>>> >> >>>> objects *other* than using them to fake structs? (Much to > >>>>>> >> >>>> derecommend, as > >>>>>> >> >>>> you mentioned!) > >>>>>> >> >>>> > >>>>>> >> >>>> On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett > >>>>>> >> >>>> > >>>>>> >> >>>> wrote: > >>>>>> >> >>>>> > >>>>>> >> >>>>> I think both are useful, but the one you suggest requires a > >>>>>> >> >>>>> lot more > >>>>>> >> >>>>> plumbing and doesn't subsume all of the usecases of the > >>>>>> >> >>>>> other. > >>>>>> >> >>>>> > >>>>>> >> >>>>> -Edward > >>>>>> >> >>>>> > >>>>>> >> >>>>> On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton > >>>>>> >> >>>>> > >>>>>> >> >>>>> wrote: > >>>>>> >> >>>>>> > >>>>>> >> >>>>>> So that primitive is an array like thing (Same pointed > type, > >>>>>> >> >>>>>> unbounded > >>>>>> >> >>>>>> length) with extra payload. > >>>>>> >> >>>>>> > >>>>>> >> >>>>>> I can see how we can do without structs if we have arrays, > >>>>>> >> >>>>>> especially > >>>>>> >> >>>>>> with the extra payload at front. But wouldn't the general > >>>>>> >> >>>>>> solution > >>>>>> >> >>>>>> for > >>>>>> >> >>>>>> structs be one that that allows new user data type defs > for > >>>>>> >> >>>>>> # > >>>>>> >> >>>>>> types? > >>>>>> >> >>>>>> > >>>>>> >> >>>>>> > >>>>>> >> >>>>>> > >>>>>> >> >>>>>> On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett > >>>>>> >> >>>>>> > >>>>>> >> >>>>>> wrote: > >>>>>> >> >>>>>>> > >>>>>> >> >>>>>>> Some form of MutableStruct# with a known number of words > >>>>>> >> >>>>>>> and a > >>>>>> >> >>>>>>> known > >>>>>> >> >>>>>>> number of pointers is basically what Ryan Yates was > >>>>>> >> >>>>>>> suggesting > >>>>>> >> >>>>>>> above, but > >>>>>> >> >>>>>>> where the word counts were stored in the objects > >>>>>> >> >>>>>>> themselves. > >>>>>> >> >>>>>>> > >>>>>> >> >>>>>>> Given that it'd have a couple of words for those counts > >>>>>> >> >>>>>>> it'd > >>>>>> >> >>>>>>> likely > >>>>>> >> >>>>>>> want to be something we build in addition to MutVar# > rather > >>>>>> >> >>>>>>> than a > >>>>>> >> >>>>>>> replacement. > >>>>>> >> >>>>>>> > >>>>>> >> >>>>>>> On the other hand, if we had to fix those numbers and > build > >>>>>> >> >>>>>>> info > >>>>>> >> >>>>>>> tables that knew them, and typechecker support, for > >>>>>> >> >>>>>>> instance, it'd > >>>>>> >> >>>>>>> get > >>>>>> >> >>>>>>> rather invasive. > >>>>>> >> >>>>>>> > >>>>>> >> >>>>>>> Also, a number of things that we can do with the 'sized' > >>>>>> >> >>>>>>> versions > >>>>>> >> >>>>>>> above, like working with evil unsized c-style arrays > >>>>>> >> >>>>>>> directly > >>>>>> >> >>>>>>> inline at the > >>>>>> >> >>>>>>> end of the structure cease to be possible, so it isn't > even > >>>>>> >> >>>>>>> a pure > >>>>>> >> >>>>>>> win if we > >>>>>> >> >>>>>>> did the engineering effort. > >>>>>> >> >>>>>>> > >>>>>> >> >>>>>>> I think 90% of the needs I have are covered just by > adding > >>>>>> >> >>>>>>> the one > >>>>>> >> >>>>>>> primitive. The last 10% gets pretty invasive. > >>>>>> >> >>>>>>> > >>>>>> >> >>>>>>> -Edward > >>>>>> >> >>>>>>> > >>>>>> >> >>>>>>> On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton > >>>>>> >> >>>>>>> > >>>>>> >> >>>>>>> wrote: > >>>>>> >> >>>>>>>> > >>>>>> >> >>>>>>>> I like the possibility of a general solution for mutable > >>>>>> >> >>>>>>>> structs > >>>>>> >> >>>>>>>> (like Ed said), and I'm trying to fully understand why > >>>>>> >> >>>>>>>> it's hard. > >>>>>> >> >>>>>>>> > >>>>>> >> >>>>>>>> So, we can't unpack MutVar into constructors because of > >>>>>> >> >>>>>>>> object > >>>>>> >> >>>>>>>> identity problems. But what about directly supporting an > >>>>>> >> >>>>>>>> extensible set of > >>>>>> >> >>>>>>>> unlifted MutStruct# objects, generalizing (and even > >>>>>> >> >>>>>>>> replacing) > >>>>>> >> >>>>>>>> MutVar#? That > >>>>>> >> >>>>>>>> may be too much work, but is it problematic otherwise? > >>>>>> >> >>>>>>>> > >>>>>> >> >>>>>>>> Needless to say, this is also critical if we ever want > >>>>>> >> >>>>>>>> best in > >>>>>> >> >>>>>>>> class > >>>>>> >> >>>>>>>> lockfree mutable structures, just like their Stm and > >>>>>> >> >>>>>>>> sequential > >>>>>> >> >>>>>>>> counterparts. > >>>>>> >> >>>>>>>> > >>>>>> >> >>>>>>>> On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones > >>>>>> >> >>>>>>>> wrote: > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> At the very least I'll take this email and turn it > into a > >>>>>> >> >>>>>>>>> short > >>>>>> >> >>>>>>>>> article. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> Yes, please do make it into a wiki page on the GHC > Trac, > >>>>>> >> >>>>>>>>> and > >>>>>> >> >>>>>>>>> maybe > >>>>>> >> >>>>>>>>> make a ticket for it. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> Thanks > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> Simon > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> From: Edward Kmett [mailto:ekmett at gmail.com] > >>>>>> >> >>>>>>>>> Sent: 27 August 2015 16:54 > >>>>>> >> >>>>>>>>> To: Simon Peyton Jones > >>>>>> >> >>>>>>>>> Cc: Manuel M T Chakravarty; Simon Marlow; ghc-devs > >>>>>> >> >>>>>>>>> Subject: Re: ArrayArrays > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> An ArrayArray# is just an Array# with a modified > >>>>>> >> >>>>>>>>> invariant. It > >>>>>> >> >>>>>>>>> points directly to other unlifted ArrayArray#'s or > >>>>>> >> >>>>>>>>> ByteArray#'s. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> While those live in #, they are garbage collected > >>>>>> >> >>>>>>>>> objects, so > >>>>>> >> >>>>>>>>> this > >>>>>> >> >>>>>>>>> all lives on the heap. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> They were added to make some of the DPH stuff fast when > >>>>>> >> >>>>>>>>> it has > >>>>>> >> >>>>>>>>> to > >>>>>> >> >>>>>>>>> deal with nested arrays. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> I'm currently abusing them as a placeholder for a > better > >>>>>> >> >>>>>>>>> thing. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> The Problem > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> ----------------- > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> Consider the scenario where you write a classic > >>>>>> >> >>>>>>>>> doubly-linked > >>>>>> >> >>>>>>>>> list > >>>>>> >> >>>>>>>>> in Haskell. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> Chasing from one DLL to the next requires following 3 > >>>>>> >> >>>>>>>>> pointers > >>>>>> >> >>>>>>>>> on > >>>>>> >> >>>>>>>>> the heap. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> DLL ~> IORef (Maybe DLL) ~> MutVar# RealWorld (Maybe > DLL) > >>>>>> >> >>>>>>>>> ~> > >>>>>> >> >>>>>>>>> Maybe > >>>>>> >> >>>>>>>>> DLL ~> DLL > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> That is 3 levels of indirection. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> We can trim one by simply unpacking the IORef with > >>>>>> >> >>>>>>>>> -funbox-strict-fields or UNPACK > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> We can trim another by adding a 'Nil' constructor for > DLL > >>>>>> >> >>>>>>>>> and > >>>>>> >> >>>>>>>>> worsening our representation. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> but now we're still stuck with a level of indirection > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> DLL ~> MutVar# RealWorld DLL ~> DLL > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> This means that every operation we perform on this > >>>>>> >> >>>>>>>>> structure > >>>>>> >> >>>>>>>>> will > >>>>>> >> >>>>>>>>> be about half of the speed of an implementation in most > >>>>>> >> >>>>>>>>> other > >>>>>> >> >>>>>>>>> languages > >>>>>> >> >>>>>>>>> assuming we're memory bound on loading things into > cache! > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> Making Progress > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> ---------------------- > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> I have been working on a number of data structures > where > >>>>>> >> >>>>>>>>> the > >>>>>> >> >>>>>>>>> indirection of going from something in * out to an > object > >>>>>> >> >>>>>>>>> in # > >>>>>> >> >>>>>>>>> which > >>>>>> >> >>>>>>>>> contains the real pointer to my target and coming back > >>>>>> >> >>>>>>>>> effectively doubles > >>>>>> >> >>>>>>>>> my runtime. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> We go out to the MutVar# because we are allowed to put > >>>>>> >> >>>>>>>>> the > >>>>>> >> >>>>>>>>> MutVar# > >>>>>> >> >>>>>>>>> onto the mutable list when we dirty it. There is a well > >>>>>> >> >>>>>>>>> defined > >>>>>> >> >>>>>>>>> write-barrier. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> I could change out the representation to use > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> data DLL = DLL (MutableArray# RealWorld DLL) | Nil > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> I can just store two pointers in the MutableArray# > every > >>>>>> >> >>>>>>>>> time, > >>>>>> >> >>>>>>>>> but > >>>>>> >> >>>>>>>>> this doesn't help _much_ directly. It has reduced the > >>>>>> >> >>>>>>>>> amount of > >>>>>> >> >>>>>>>>> distinct > >>>>>> >> >>>>>>>>> addresses in memory I touch on a walk of the DLL from 3 > >>>>>> >> >>>>>>>>> per > >>>>>> >> >>>>>>>>> object to 2. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> I still have to go out to the heap from my DLL and get > to > >>>>>> >> >>>>>>>>> the > >>>>>> >> >>>>>>>>> array > >>>>>> >> >>>>>>>>> object and then chase it to the next DLL and chase that > >>>>>> >> >>>>>>>>> to the > >>>>>> >> >>>>>>>>> next array. I > >>>>>> >> >>>>>>>>> do get my two pointers together in memory though. I'm > >>>>>> >> >>>>>>>>> paying for > >>>>>> >> >>>>>>>>> a card > >>>>>> >> >>>>>>>>> marking table as well, which I don't particularly need > >>>>>> >> >>>>>>>>> with just > >>>>>> >> >>>>>>>>> two > >>>>>> >> >>>>>>>>> pointers, but we can shed that with the > >>>>>> >> >>>>>>>>> "SmallMutableArray#" > >>>>>> >> >>>>>>>>> machinery added > >>>>>> >> >>>>>>>>> back in 7.10, which is just the old array code a a new > >>>>>> >> >>>>>>>>> data > >>>>>> >> >>>>>>>>> type, which can > >>>>>> >> >>>>>>>>> speed things up a bit when you don't have very big > >>>>>> >> >>>>>>>>> arrays: > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> But what if I wanted my object itself to live in # and > >>>>>> >> >>>>>>>>> have two > >>>>>> >> >>>>>>>>> mutable fields and be able to share the sme write > >>>>>> >> >>>>>>>>> barrier? > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> An ArrayArray# points directly to other unlifted array > >>>>>> >> >>>>>>>>> types. > >>>>>> >> >>>>>>>>> What > >>>>>> >> >>>>>>>>> if we have one # -> * wrapper on the outside to deal > with > >>>>>> >> >>>>>>>>> the > >>>>>> >> >>>>>>>>> impedence > >>>>>> >> >>>>>>>>> mismatch between the imperative world and Haskell, and > >>>>>> >> >>>>>>>>> then just > >>>>>> >> >>>>>>>>> let the > >>>>>> >> >>>>>>>>> ArrayArray#'s hold other arrayarrays. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> data DLL = DLL (MutableArrayArray# RealWorld) > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> now I need to make up a new Nil, which I can just make > be > >>>>>> >> >>>>>>>>> a > >>>>>> >> >>>>>>>>> special > >>>>>> >> >>>>>>>>> MutableArrayArray# I allocate on program startup. I can > >>>>>> >> >>>>>>>>> even > >>>>>> >> >>>>>>>>> abuse pattern > >>>>>> >> >>>>>>>>> synonyms. Alternately I can exploit the internals > further > >>>>>> >> >>>>>>>>> to > >>>>>> >> >>>>>>>>> make this > >>>>>> >> >>>>>>>>> cheaper. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> Then I can use the readMutableArrayArray# and > >>>>>> >> >>>>>>>>> writeMutableArrayArray# calls to directly access the > >>>>>> >> >>>>>>>>> preceding > >>>>>> >> >>>>>>>>> and next > >>>>>> >> >>>>>>>>> entry in the linked list. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> So now we have one DLL wrapper which just 'bootstraps > me' > >>>>>> >> >>>>>>>>> into a > >>>>>> >> >>>>>>>>> strict world, and everything there lives in #. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> next :: DLL -> IO DLL > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> next (DLL m) = IO $ \s -> case readMutableArrayArray# s > >>>>>> >> >>>>>>>>> of > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> (# s', n #) -> (# s', DLL n #) > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> It turns out GHC is quite happy to optimize all of that > >>>>>> >> >>>>>>>>> code to > >>>>>> >> >>>>>>>>> keep things unboxed. The 'DLL' wrappers get removed > >>>>>> >> >>>>>>>>> pretty > >>>>>> >> >>>>>>>>> easily when they > >>>>>> >> >>>>>>>>> are known strict and you chain operations of this sort! > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> Cleaning it Up > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> ------------------ > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> Now I have one outermost indirection pointing to an > array > >>>>>> >> >>>>>>>>> that > >>>>>> >> >>>>>>>>> points directly to other arrays. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> I'm stuck paying for a card marking table per object, > but > >>>>>> >> >>>>>>>>> I can > >>>>>> >> >>>>>>>>> fix > >>>>>> >> >>>>>>>>> that by duplicating the code for MutableArrayArray# and > >>>>>> >> >>>>>>>>> using a > >>>>>> >> >>>>>>>>> SmallMutableArray#. I can hack up primops that let me > >>>>>> >> >>>>>>>>> store a > >>>>>> >> >>>>>>>>> mixture of > >>>>>> >> >>>>>>>>> SmallMutableArray# fields and normal ones in the data > >>>>>> >> >>>>>>>>> structure. > >>>>>> >> >>>>>>>>> Operationally, I can even do so by just unsafeCoercing > >>>>>> >> >>>>>>>>> the > >>>>>> >> >>>>>>>>> existing > >>>>>> >> >>>>>>>>> SmallMutableArray# primitives to change the kind of one > >>>>>> >> >>>>>>>>> of the > >>>>>> >> >>>>>>>>> arguments it > >>>>>> >> >>>>>>>>> takes. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> This is almost ideal, but not quite. I often have > fields > >>>>>> >> >>>>>>>>> that > >>>>>> >> >>>>>>>>> would > >>>>>> >> >>>>>>>>> be best left unboxed. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> data DLLInt = DLL !Int !(IORef DLL) !(IORef DLL) | Nil > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> was able to unpack the Int, but we lost that. We can > >>>>>> >> >>>>>>>>> currently > >>>>>> >> >>>>>>>>> at > >>>>>> >> >>>>>>>>> best point one of the entries of the SmallMutableArray# > >>>>>> >> >>>>>>>>> at a > >>>>>> >> >>>>>>>>> boxed or at a > >>>>>> >> >>>>>>>>> MutableByteArray# for all of our misc. data and shove > the > >>>>>> >> >>>>>>>>> int in > >>>>>> >> >>>>>>>>> question in > >>>>>> >> >>>>>>>>> there. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> e.g. if I were to implement a hash-array-mapped-trie I > >>>>>> >> >>>>>>>>> need to > >>>>>> >> >>>>>>>>> store masks and administrivia as I walk down the tree. > >>>>>> >> >>>>>>>>> Having to > >>>>>> >> >>>>>>>>> go off to > >>>>>> >> >>>>>>>>> the side costs me the entire win from avoiding the > first > >>>>>> >> >>>>>>>>> pointer > >>>>>> >> >>>>>>>>> chase. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> But, if like Ryan suggested, we had a heap object we > >>>>>> >> >>>>>>>>> could > >>>>>> >> >>>>>>>>> construct that had n words with unsafe access and m > >>>>>> >> >>>>>>>>> pointers to > >>>>>> >> >>>>>>>>> other heap > >>>>>> >> >>>>>>>>> objects, one that could put itself on the mutable list > >>>>>> >> >>>>>>>>> when any > >>>>>> >> >>>>>>>>> of those > >>>>>> >> >>>>>>>>> pointers changed then I could shed this last factor of > >>>>>> >> >>>>>>>>> two in > >>>>>> >> >>>>>>>>> all > >>>>>> >> >>>>>>>>> circumstances. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> Prototype > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> ------------- > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> Over the last few days I've put together a small > >>>>>> >> >>>>>>>>> prototype > >>>>>> >> >>>>>>>>> implementation with a few non-trivial imperative data > >>>>>> >> >>>>>>>>> structures > >>>>>> >> >>>>>>>>> for things > >>>>>> >> >>>>>>>>> like Tarjan's link-cut trees, the list labeling problem > >>>>>> >> >>>>>>>>> and > >>>>>> >> >>>>>>>>> order-maintenance. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> https://github.com/ekmett/structs > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> Notable bits: > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> Data.Struct.Internal.LinkCut provides an implementation > >>>>>> >> >>>>>>>>> of > >>>>>> >> >>>>>>>>> link-cut > >>>>>> >> >>>>>>>>> trees in this style. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> Data.Struct.Internal provides the rather horrifying > guts > >>>>>> >> >>>>>>>>> that > >>>>>> >> >>>>>>>>> make > >>>>>> >> >>>>>>>>> it go fast. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> Once compiled with -O or -O2, if you look at the core, > >>>>>> >> >>>>>>>>> almost > >>>>>> >> >>>>>>>>> all > >>>>>> >> >>>>>>>>> the references to the LinkCut or Object data > constructor > >>>>>> >> >>>>>>>>> get > >>>>>> >> >>>>>>>>> optimized away, > >>>>>> >> >>>>>>>>> and we're left with beautiful strict code directly > >>>>>> >> >>>>>>>>> mutating out > >>>>>> >> >>>>>>>>> underlying > >>>>>> >> >>>>>>>>> representation. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> At the very least I'll take this email and turn it > into a > >>>>>> >> >>>>>>>>> short > >>>>>> >> >>>>>>>>> article. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> -Edward > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> On Thu, Aug 27, 2015 at 9:00 AM, Simon Peyton Jones > >>>>>> >> >>>>>>>>> wrote: > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> Just to say that I have no idea what is going on in > this > >>>>>> >> >>>>>>>>> thread. > >>>>>> >> >>>>>>>>> What is ArrayArray? What is the issue in general? Is > >>>>>> >> >>>>>>>>> there a > >>>>>> >> >>>>>>>>> ticket? Is > >>>>>> >> >>>>>>>>> there a wiki page? > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> If it?s important, an ab-initio wiki page + ticket > would > >>>>>> >> >>>>>>>>> be a > >>>>>> >> >>>>>>>>> good > >>>>>> >> >>>>>>>>> thing. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> Simon > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] > On > >>>>>> >> >>>>>>>>> Behalf > >>>>>> >> >>>>>>>>> Of > >>>>>> >> >>>>>>>>> Edward Kmett > >>>>>> >> >>>>>>>>> Sent: 21 August 2015 05:25 > >>>>>> >> >>>>>>>>> To: Manuel M T Chakravarty > >>>>>> >> >>>>>>>>> Cc: Simon Marlow; ghc-devs > >>>>>> >> >>>>>>>>> Subject: Re: ArrayArrays > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> When (ab)using them for this purpose, SmallArrayArray's > >>>>>> >> >>>>>>>>> would be > >>>>>> >> >>>>>>>>> very handy as well. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> Consider right now if I have something like an > >>>>>> >> >>>>>>>>> order-maintenance > >>>>>> >> >>>>>>>>> structure I have: > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> data Upper s = Upper {-# UNPACK #-} !(MutableByteArray > s) > >>>>>> >> >>>>>>>>> {-# > >>>>>> >> >>>>>>>>> UNPACK #-} !(MutVar s (Upper s)) {-# UNPACK #-} > !(MutVar > >>>>>> >> >>>>>>>>> s > >>>>>> >> >>>>>>>>> (Upper s)) > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> data Lower s = Lower {-# UNPACK #-} !(MutVar s (Upper > s)) > >>>>>> >> >>>>>>>>> {-# > >>>>>> >> >>>>>>>>> UNPACK #-} !(MutableByteArray s) {-# UNPACK #-} > !(MutVar > >>>>>> >> >>>>>>>>> s > >>>>>> >> >>>>>>>>> (Lower s)) {-# > >>>>>> >> >>>>>>>>> UNPACK #-} !(MutVar s (Lower s)) > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> The former contains, logically, a mutable integer and > two > >>>>>> >> >>>>>>>>> pointers, > >>>>>> >> >>>>>>>>> one for forward and one for backwards. The latter is > >>>>>> >> >>>>>>>>> basically > >>>>>> >> >>>>>>>>> the same > >>>>>> >> >>>>>>>>> thing with a mutable reference up pointing at the > >>>>>> >> >>>>>>>>> structure > >>>>>> >> >>>>>>>>> above. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> On the heap this is an object that points to a > structure > >>>>>> >> >>>>>>>>> for the > >>>>>> >> >>>>>>>>> bytearray, and points to another structure for each > >>>>>> >> >>>>>>>>> mutvar which > >>>>>> >> >>>>>>>>> each point > >>>>>> >> >>>>>>>>> to the other 'Upper' structure. So there is a level of > >>>>>> >> >>>>>>>>> indirection smeared > >>>>>> >> >>>>>>>>> over everything. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> So this is a pair of doubly linked lists with an upward > >>>>>> >> >>>>>>>>> link > >>>>>> >> >>>>>>>>> from > >>>>>> >> >>>>>>>>> the structure below to the structure above. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> Converted into ArrayArray#s I'd get > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> data Upper s = Upper (MutableArrayArray# s) > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> w/ the first slot being a pointer to a > MutableByteArray#, > >>>>>> >> >>>>>>>>> and > >>>>>> >> >>>>>>>>> the > >>>>>> >> >>>>>>>>> next 2 slots pointing to the previous and next previous > >>>>>> >> >>>>>>>>> objects, > >>>>>> >> >>>>>>>>> represented > >>>>>> >> >>>>>>>>> just as their MutableArrayArray#s. I can use > >>>>>> >> >>>>>>>>> sameMutableArrayArray# on these > >>>>>> >> >>>>>>>>> for object identity, which lets me check for the ends > of > >>>>>> >> >>>>>>>>> the > >>>>>> >> >>>>>>>>> lists by tying > >>>>>> >> >>>>>>>>> things back on themselves. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> and below that > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> data Lower s = Lower (MutableArrayArray# s) > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> is similar, with an extra MutableArrayArray slot > pointing > >>>>>> >> >>>>>>>>> up to > >>>>>> >> >>>>>>>>> an > >>>>>> >> >>>>>>>>> upper structure. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> I can then write a handful of combinators for getting > out > >>>>>> >> >>>>>>>>> the > >>>>>> >> >>>>>>>>> slots > >>>>>> >> >>>>>>>>> in question, while it has gained a level of indirection > >>>>>> >> >>>>>>>>> between > >>>>>> >> >>>>>>>>> the wrapper > >>>>>> >> >>>>>>>>> to put it in * and the MutableArrayArray# s in #, that > >>>>>> >> >>>>>>>>> one can > >>>>>> >> >>>>>>>>> be basically > >>>>>> >> >>>>>>>>> erased by ghc. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> Unlike before I don't have several separate objects on > >>>>>> >> >>>>>>>>> the heap > >>>>>> >> >>>>>>>>> for > >>>>>> >> >>>>>>>>> each thing. I only have 2 now. The MutableArrayArray# > for > >>>>>> >> >>>>>>>>> the > >>>>>> >> >>>>>>>>> object itself, > >>>>>> >> >>>>>>>>> and the MutableByteArray# that it references to carry > >>>>>> >> >>>>>>>>> around the > >>>>>> >> >>>>>>>>> mutable > >>>>>> >> >>>>>>>>> int. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> The only pain points are > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> 1.) the aforementioned limitation that currently > prevents > >>>>>> >> >>>>>>>>> me > >>>>>> >> >>>>>>>>> from > >>>>>> >> >>>>>>>>> stuffing normal boxed data through a SmallArray or > Array > >>>>>> >> >>>>>>>>> into an > >>>>>> >> >>>>>>>>> ArrayArray > >>>>>> >> >>>>>>>>> leaving me in a little ghetto disconnected from the > rest > >>>>>> >> >>>>>>>>> of > >>>>>> >> >>>>>>>>> Haskell, > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> and > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> 2.) the lack of SmallArrayArray's, which could let us > >>>>>> >> >>>>>>>>> avoid the > >>>>>> >> >>>>>>>>> card marking overhead. These objects are all small, 3-4 > >>>>>> >> >>>>>>>>> pointers > >>>>>> >> >>>>>>>>> wide. Card > >>>>>> >> >>>>>>>>> marking doesn't help. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> Alternately I could just try to do really evil things > and > >>>>>> >> >>>>>>>>> convert > >>>>>> >> >>>>>>>>> the whole mess to SmallArrays and then figure out how > to > >>>>>> >> >>>>>>>>> unsafeCoerce my way > >>>>>> >> >>>>>>>>> to glory, stuffing the #'d references to the other > arrays > >>>>>> >> >>>>>>>>> directly into the > >>>>>> >> >>>>>>>>> SmallArray as slots, removing the limitation we see > here > >>>>>> >> >>>>>>>>> by > >>>>>> >> >>>>>>>>> aping the > >>>>>> >> >>>>>>>>> MutableArrayArray# s API, but that gets really really > >>>>>> >> >>>>>>>>> dangerous! > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> I'm pretty much willing to sacrifice almost anything on > >>>>>> >> >>>>>>>>> the > >>>>>> >> >>>>>>>>> altar > >>>>>> >> >>>>>>>>> of speed here, but I'd like to be able to let the GC > move > >>>>>> >> >>>>>>>>> them > >>>>>> >> >>>>>>>>> and collect > >>>>>> >> >>>>>>>>> them which rules out simpler Ptr and Addr based > >>>>>> >> >>>>>>>>> solutions. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> -Edward > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> On Thu, Aug 20, 2015 at 9:01 PM, Manuel M T Chakravarty > >>>>>> >> >>>>>>>>> wrote: > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> That?s an interesting idea. > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> Manuel > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > Edward Kmett : > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > > >>>>>> >> >>>>>>>>> > Would it be possible to add unsafe primops to add > >>>>>> >> >>>>>>>>> > Array# and > >>>>>> >> >>>>>>>>> > SmallArray# entries to an ArrayArray#? The fact that > >>>>>> >> >>>>>>>>> > the > >>>>>> >> >>>>>>>>> > ArrayArray# entries > >>>>>> >> >>>>>>>>> > are all directly unlifted avoiding a level of > >>>>>> >> >>>>>>>>> > indirection for > >>>>>> >> >>>>>>>>> > the containing > >>>>>> >> >>>>>>>>> > structure is amazing, but I can only currently use it > >>>>>> >> >>>>>>>>> > if my > >>>>>> >> >>>>>>>>> > leaf level data > >>>>>> >> >>>>>>>>> > can be 100% unboxed and distributed among > ByteArray#s. > >>>>>> >> >>>>>>>>> > It'd be > >>>>>> >> >>>>>>>>> > nice to be > >>>>>> >> >>>>>>>>> > able to have the ability to put SmallArray# a stuff > >>>>>> >> >>>>>>>>> > down at > >>>>>> >> >>>>>>>>> > the leaves to > >>>>>> >> >>>>>>>>> > hold lifted contents. > >>>>>> >> >>>>>>>>> > > >>>>>> >> >>>>>>>>> > I accept fully that if I name the wrong type when I > go > >>>>>> >> >>>>>>>>> > to > >>>>>> >> >>>>>>>>> > access > >>>>>> >> >>>>>>>>> > one of the fields it'll lie to me, but I suppose it'd > >>>>>> >> >>>>>>>>> > do that > >>>>>> >> >>>>>>>>> > if i tried to > >>>>>> >> >>>>>>>>> > use one of the members that held a nested ArrayArray# > >>>>>> >> >>>>>>>>> > as a > >>>>>> >> >>>>>>>>> > ByteArray# > >>>>>> >> >>>>>>>>> > anyways, so it isn't like there is a safety story > >>>>>> >> >>>>>>>>> > preventing > >>>>>> >> >>>>>>>>> > this. > >>>>>> >> >>>>>>>>> > > >>>>>> >> >>>>>>>>> > I've been hunting for ways to try to kill the > >>>>>> >> >>>>>>>>> > indirection > >>>>>> >> >>>>>>>>> > problems I get with Haskell and mutable structures, > and > >>>>>> >> >>>>>>>>> > I > >>>>>> >> >>>>>>>>> > could shoehorn a > >>>>>> >> >>>>>>>>> > number of them into ArrayArrays if this worked. > >>>>>> >> >>>>>>>>> > > >>>>>> >> >>>>>>>>> > Right now I'm stuck paying for 2 or 3 levels of > >>>>>> >> >>>>>>>>> > unnecessary > >>>>>> >> >>>>>>>>> > indirection compared to c/java and this could reduce > >>>>>> >> >>>>>>>>> > that pain > >>>>>> >> >>>>>>>>> > to just 1 > >>>>>> >> >>>>>>>>> > level of unnecessary indirection. > >>>>>> >> >>>>>>>>> > > >>>>>> >> >>>>>>>>> > -Edward > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > _______________________________________________ > >>>>>> >> >>>>>>>>> > ghc-devs mailing list > >>>>>> >> >>>>>>>>> > ghc-devs at haskell.org > >>>>>> >> >>>>>>>>> > > >>>>>> >> >>>>>>>>> > > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> _______________________________________________ > >>>>>> >> >>>>>>>>> ghc-devs mailing list > >>>>>> >> >>>>>>>>> ghc-devs at haskell.org > >>>>>> >> >>>>>>>>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > >>>>>> >> >>>>>>> > >>>>>> >> >>>>>>> > >>>>>> >> >>>>> > >>>>>> >> >>> > >>>>>> >> >> > >>>>>> >> > > >>>>>> >> > > >>>>>> >> > _______________________________________________ > >>>>>> >> > ghc-devs mailing list > >>>>>> >> > ghc-devs at haskell.org > >>>>>> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > >>>>>> >> > > >>>>>> > > >>>>>> > > >>>>> > >>>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> ghc-devs mailing list > >>>> ghc-devs at haskell.org > >>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > >>>> > >>> > >> > > > > _______________________________________________ > > ghc-devs mailing list > > ghc-devs at haskell.org > > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eir at cis.upenn.edu Tue Sep 1 06:45:40 2015 From: eir at cis.upenn.edu (Richard Eisenberg) Date: Mon, 31 Aug 2015 23:45:40 -0700 Subject: more releases Message-ID: <3E39E8B5-89C2-40F6-9180-C6D73AF3926F@cis.upenn.edu> Hi devs, An interesting topic came up over dinner tonight: what if GHC made more releases? As an extreme example, we could release a new point version every time a bug fix gets merged to the stable branch. This may be a terrible idea. But what's stopping us from doing so? The biggest objection I can see is that we would want to make sure that users' code would work with the new version. Could the Stackage crew help us with this? If they run their nightly build with a release candidate and diff against the prior results, we would get a pretty accurate sense of whether the bugfix is good. If this test succeeds, why not release? Would it be hard to automate the packaging/posting process? The advantage to more releases is that it gets bugfixes in more hands sooner. What are the disadvantages? Richard PS: I'm not 100% sold on this idea. But I thought it was interesting enough to raise a broader discussion. From ekmett at gmail.com Tue Sep 1 06:50:14 2015 From: ekmett at gmail.com (Edward Kmett) Date: Mon, 31 Aug 2015 23:50:14 -0700 Subject: ArrayArrays In-Reply-To: References: <4DACFC45-0E7E-4B3F-8435-5365EC3F7749@cse.unsw.edu.au> <65158505c7be41afad85374d246b7350@DB4PR30MB030.064d.mgd.msft.net> <2FCB6298-A4FF-4F7B-8BF8-4880BB3154AB@gmail.com> Message-ID: Works for me. On Mon, Aug 31, 2015 at 10:14 PM, Johan Tibell wrote: > Works for me. > > On Mon, Aug 31, 2015 at 3:50 PM, Ryan Yates wrote: > >> Any time works for me. >> >> Ryan >> >> On Mon, Aug 31, 2015 at 6:11 PM, Ryan Newton wrote: >> > Dear Edward, Ryan Yates, and other interested parties -- >> > >> > So when should we meet up about this? >> > >> > May I propose the Tues afternoon break for everyone at ICFP who is >> > interested in this topic? We can meet out in the coffee area and >> congregate >> > around Edward Kmett, who is tall and should be easy to find ;-). >> > >> > I think Ryan is going to show us how to use his new primops for combined >> > array + other fields in one heap object? >> > >> > On Sat, Aug 29, 2015 at 9:24 PM Edward Kmett wrote: >> >> >> >> Without a custom primitive it doesn't help much there, you have to >> store >> >> the indirection to the mask. >> >> >> >> With a custom primitive it should cut the on heap root-to-leaf path of >> >> everything in the HAMT in half. A shorter HashMap was actually one of >> the >> >> motivating factors for me doing this. It is rather astoundingly >> difficult to >> >> beat the performance of HashMap, so I had to start cheating pretty >> badly. ;) >> >> >> >> -Edward >> >> >> >> On Sat, Aug 29, 2015 at 5:45 PM, Johan Tibell >> >> wrote: >> >>> >> >>> I'd also be interested to chat at ICFP to see if I can use this for my >> >>> HAMT implementation. >> >>> >> >>> On Sat, Aug 29, 2015 at 3:07 PM, Edward Kmett >> wrote: >> >>>> >> >>>> Sounds good to me. Right now I'm just hacking up composable accessors >> >>>> for "typed slots" in a fairly lens-like fashion, and treating the >> set of >> >>>> slots I define and the 'new' function I build for the data type as >> its API, >> >>>> and build atop that. This could eventually graduate to >> template-haskell, but >> >>>> I'm not entirely satisfied with the solution I have. I currently >> distinguish >> >>>> between what I'm calling "slots" (things that point directly to >> another >> >>>> SmallMutableArrayArray# sans wrapper) and "fields" which point >> directly to >> >>>> the usual Haskell data types because unifying the two notions meant >> that I >> >>>> couldn't lift some coercions out "far enough" to make them vanish. >> >>>> >> >>>> I'll be happy to run through my current working set of issues in >> person >> >>>> and -- as things get nailed down further -- in a longer lived medium >> than in >> >>>> personal conversations. ;) >> >>>> >> >>>> -Edward >> >>>> >> >>>> On Sat, Aug 29, 2015 at 7:59 AM, Ryan Newton >> wrote: >> >>>>> >> >>>>> I'd also love to meet up at ICFP and discuss this. I think the >> array >> >>>>> primops plus a TH layer that lets (ab)use them many times without >> too much >> >>>>> marginal cost sounds great. And I'd like to learn how we could be >> either >> >>>>> early users of, or help with, this infrastructure. >> >>>>> >> >>>>> CC'ing in Ryan Scot and Omer Agacan who may also be interested in >> >>>>> dropping in on such discussions @ICFP, and Chao-Hong Chen, a Ph.D. >> student >> >>>>> who is currently working on concurrent data structures in Haskell, >> but will >> >>>>> not be at ICFP. >> >>>>> >> >>>>> >> >>>>> On Fri, Aug 28, 2015 at 7:47 PM, Ryan Yates >> >>>>> wrote: >> >>>>>> >> >>>>>> I completely agree. I would love to spend some time during ICFP >> and >> >>>>>> friends talking about what it could look like. My small array for >> STM >> >>>>>> changes for the RTS can be seen here [1]. It is on a branch >> somewhere >> >>>>>> between 7.8 and 7.10 and includes irrelevant STM bits and some >> >>>>>> confusing naming choices (sorry), but should cover all the details >> >>>>>> needed to implement it for a non-STM context. The biggest surprise >> >>>>>> for me was following small array too closely and having a word/byte >> >>>>>> offset miss-match [2]. >> >>>>>> >> >>>>>> [1]: >> >>>>>> >> https://github.com/fryguybob/ghc/compare/ghc-htm-bloom...fryguybob:ghc-htm-mut >> >>>>>> [2]: https://ghc.haskell.org/trac/ghc/ticket/10413 >> >>>>>> >> >>>>>> Ryan >> >>>>>> >> >>>>>> On Fri, Aug 28, 2015 at 10:09 PM, Edward Kmett >> >>>>>> wrote: >> >>>>>> > I'd love to have that last 10%, but its a lot of work to get >> there >> >>>>>> > and more >> >>>>>> > importantly I don't know quite what it should look like. >> >>>>>> > >> >>>>>> > On the other hand, I do have a pretty good idea of how the >> >>>>>> > primitives above >> >>>>>> > could be banged out and tested in a long evening, well in time >> for >> >>>>>> > 7.12. And >> >>>>>> > as noted earlier, those remain useful even if a nicer typed >> version >> >>>>>> > with an >> >>>>>> > extra level of indirection to the sizes is built up after. >> >>>>>> > >> >>>>>> > The rest sounds like a good graduate student project for someone >> who >> >>>>>> > has >> >>>>>> > graduate students lying around. Maybe somebody at Indiana >> University >> >>>>>> > who has >> >>>>>> > an interest in type theory and parallelism can find us one. =) >> >>>>>> > >> >>>>>> > -Edward >> >>>>>> > >> >>>>>> > On Fri, Aug 28, 2015 at 8:48 PM, Ryan Yates > > >> >>>>>> > wrote: >> >>>>>> >> >> >>>>>> >> I think from my perspective, the motivation for getting the type >> >>>>>> >> checker involved is primarily bringing this to the level where >> >>>>>> >> users >> >>>>>> >> could be expected to build these structures. it is reasonable >> to >> >>>>>> >> think that there are people who want to use STM (a context with >> >>>>>> >> mutation already) to implement a straight forward data structure >> >>>>>> >> that >> >>>>>> >> avoids extra indirection penalty. There should be some places >> >>>>>> >> where >> >>>>>> >> knowing that things are field accesses rather then array >> indexing >> >>>>>> >> could be helpful, but I think GHC is good right now about >> handling >> >>>>>> >> constant offsets. In my code I don't do any bounds checking as >> I >> >>>>>> >> know >> >>>>>> >> I will only be accessing my arrays with constant indexes. I >> make >> >>>>>> >> wrappers for each field access and leave all the unsafe stuff in >> >>>>>> >> there. When things go wrong though, the compiler is no help. >> >>>>>> >> Maybe >> >>>>>> >> template Haskell that generates the appropriate wrappers is the >> >>>>>> >> right >> >>>>>> >> direction to go. >> >>>>>> >> There is another benefit for me when working with these as >> arrays >> >>>>>> >> in >> >>>>>> >> that it is quite simple and direct (given the hoops already >> jumped >> >>>>>> >> through) to play with alignment. I can ensure two pointers are >> >>>>>> >> never >> >>>>>> >> on the same cache-line by just spacing things out in the array. >> >>>>>> >> >> >>>>>> >> On Fri, Aug 28, 2015 at 7:33 PM, Edward Kmett > > >> >>>>>> >> wrote: >> >>>>>> >> > They just segfault at this level. ;) >> >>>>>> >> > >> >>>>>> >> > Sent from my iPhone >> >>>>>> >> > >> >>>>>> >> > On Aug 28, 2015, at 7:25 PM, Ryan Newton >> >>>>>> >> > wrote: >> >>>>>> >> > >> >>>>>> >> > You presumably also save a bounds check on reads by >> hard-coding >> >>>>>> >> > the >> >>>>>> >> > sizes? >> >>>>>> >> > >> >>>>>> >> > On Fri, Aug 28, 2015 at 3:39 PM, Edward Kmett < >> ekmett at gmail.com> >> >>>>>> >> > wrote: >> >>>>>> >> >> >> >>>>>> >> >> Also there are 4 different "things" here, basically >> depending on >> >>>>>> >> >> two >> >>>>>> >> >> independent questions: >> >>>>>> >> >> >> >>>>>> >> >> a.) if you want to shove the sizes into the info table, and >> >>>>>> >> >> b.) if you want cardmarking. >> >>>>>> >> >> >> >>>>>> >> >> Versions with/without cardmarking for different sizes can be >> >>>>>> >> >> done >> >>>>>> >> >> pretty >> >>>>>> >> >> easily, but as noted, the infotable variants are pretty >> >>>>>> >> >> invasive. >> >>>>>> >> >> >> >>>>>> >> >> -Edward >> >>>>>> >> >> >> >>>>>> >> >> On Fri, Aug 28, 2015 at 6:36 PM, Edward Kmett < >> ekmett at gmail.com> >> >>>>>> >> >> wrote: >> >>>>>> >> >>> >> >>>>>> >> >>> Well, on the plus side you'd save 16 bytes per object, which >> >>>>>> >> >>> adds up >> >>>>>> >> >>> if >> >>>>>> >> >>> they were small enough and there are enough of them. You >> get a >> >>>>>> >> >>> bit >> >>>>>> >> >>> better >> >>>>>> >> >>> locality of reference in terms of what fits in the first >> cache >> >>>>>> >> >>> line of >> >>>>>> >> >>> them. >> >>>>>> >> >>> >> >>>>>> >> >>> -Edward >> >>>>>> >> >>> >> >>>>>> >> >>> On Fri, Aug 28, 2015 at 6:14 PM, Ryan Newton >> >>>>>> >> >>> >> >>>>>> >> >>> wrote: >> >>>>>> >> >>>> >> >>>>>> >> >>>> Yes. And for the short term I can imagine places we will >> >>>>>> >> >>>> settle with >> >>>>>> >> >>>> arrays even if it means tracking lengths unnecessarily and >> >>>>>> >> >>>> unsafeCoercing >> >>>>>> >> >>>> pointers whose types don't actually match their siblings. >> >>>>>> >> >>>> >> >>>>>> >> >>>> Is there anything to recommend the hacks mentioned for >> fixed >> >>>>>> >> >>>> sized >> >>>>>> >> >>>> array >> >>>>>> >> >>>> objects *other* than using them to fake structs? (Much to >> >>>>>> >> >>>> derecommend, as >> >>>>>> >> >>>> you mentioned!) >> >>>>>> >> >>>> >> >>>>>> >> >>>> On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett >> >>>>>> >> >>>> >> >>>>>> >> >>>> wrote: >> >>>>>> >> >>>>> >> >>>>>> >> >>>>> I think both are useful, but the one you suggest requires >> a >> >>>>>> >> >>>>> lot more >> >>>>>> >> >>>>> plumbing and doesn't subsume all of the usecases of the >> >>>>>> >> >>>>> other. >> >>>>>> >> >>>>> >> >>>>>> >> >>>>> -Edward >> >>>>>> >> >>>>> >> >>>>>> >> >>>>> On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton >> >>>>>> >> >>>>> >> >>>>>> >> >>>>> wrote: >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> So that primitive is an array like thing (Same pointed >> type, >> >>>>>> >> >>>>>> unbounded >> >>>>>> >> >>>>>> length) with extra payload. >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> I can see how we can do without structs if we have >> arrays, >> >>>>>> >> >>>>>> especially >> >>>>>> >> >>>>>> with the extra payload at front. But wouldn't the general >> >>>>>> >> >>>>>> solution >> >>>>>> >> >>>>>> for >> >>>>>> >> >>>>>> structs be one that that allows new user data type defs >> for >> >>>>>> >> >>>>>> # >> >>>>>> >> >>>>>> types? >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> wrote: >> >>>>>> >> >>>>>>> >> >>>>>> >> >>>>>>> Some form of MutableStruct# with a known number of words >> >>>>>> >> >>>>>>> and a >> >>>>>> >> >>>>>>> known >> >>>>>> >> >>>>>>> number of pointers is basically what Ryan Yates was >> >>>>>> >> >>>>>>> suggesting >> >>>>>> >> >>>>>>> above, but >> >>>>>> >> >>>>>>> where the word counts were stored in the objects >> >>>>>> >> >>>>>>> themselves. >> >>>>>> >> >>>>>>> >> >>>>>> >> >>>>>>> Given that it'd have a couple of words for those counts >> >>>>>> >> >>>>>>> it'd >> >>>>>> >> >>>>>>> likely >> >>>>>> >> >>>>>>> want to be something we build in addition to MutVar# >> rather >> >>>>>> >> >>>>>>> than a >> >>>>>> >> >>>>>>> replacement. >> >>>>>> >> >>>>>>> >> >>>>>> >> >>>>>>> On the other hand, if we had to fix those numbers and >> build >> >>>>>> >> >>>>>>> info >> >>>>>> >> >>>>>>> tables that knew them, and typechecker support, for >> >>>>>> >> >>>>>>> instance, it'd >> >>>>>> >> >>>>>>> get >> >>>>>> >> >>>>>>> rather invasive. >> >>>>>> >> >>>>>>> >> >>>>>> >> >>>>>>> Also, a number of things that we can do with the 'sized' >> >>>>>> >> >>>>>>> versions >> >>>>>> >> >>>>>>> above, like working with evil unsized c-style arrays >> >>>>>> >> >>>>>>> directly >> >>>>>> >> >>>>>>> inline at the >> >>>>>> >> >>>>>>> end of the structure cease to be possible, so it isn't >> even >> >>>>>> >> >>>>>>> a pure >> >>>>>> >> >>>>>>> win if we >> >>>>>> >> >>>>>>> did the engineering effort. >> >>>>>> >> >>>>>>> >> >>>>>> >> >>>>>>> I think 90% of the needs I have are covered just by >> adding >> >>>>>> >> >>>>>>> the one >> >>>>>> >> >>>>>>> primitive. The last 10% gets pretty invasive. >> >>>>>> >> >>>>>>> >> >>>>>> >> >>>>>>> -Edward >> >>>>>> >> >>>>>>> >> >>>>>> >> >>>>>>> On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton >> >>>>>> >> >>>>>>> >> >>>>>> >> >>>>>>> wrote: >> >>>>>> >> >>>>>>>> >> >>>>>> >> >>>>>>>> I like the possibility of a general solution for >> mutable >> >>>>>> >> >>>>>>>> structs >> >>>>>> >> >>>>>>>> (like Ed said), and I'm trying to fully understand why >> >>>>>> >> >>>>>>>> it's hard. >> >>>>>> >> >>>>>>>> >> >>>>>> >> >>>>>>>> So, we can't unpack MutVar into constructors because of >> >>>>>> >> >>>>>>>> object >> >>>>>> >> >>>>>>>> identity problems. But what about directly supporting >> an >> >>>>>> >> >>>>>>>> extensible set of >> >>>>>> >> >>>>>>>> unlifted MutStruct# objects, generalizing (and even >> >>>>>> >> >>>>>>>> replacing) >> >>>>>> >> >>>>>>>> MutVar#? That >> >>>>>> >> >>>>>>>> may be too much work, but is it problematic otherwise? >> >>>>>> >> >>>>>>>> >> >>>>>> >> >>>>>>>> Needless to say, this is also critical if we ever want >> >>>>>> >> >>>>>>>> best in >> >>>>>> >> >>>>>>>> class >> >>>>>> >> >>>>>>>> lockfree mutable structures, just like their Stm and >> >>>>>> >> >>>>>>>> sequential >> >>>>>> >> >>>>>>>> counterparts. >> >>>>>> >> >>>>>>>> >> >>>>>> >> >>>>>>>> On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones >> >>>>>> >> >>>>>>>> wrote: >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> At the very least I'll take this email and turn it >> into a >> >>>>>> >> >>>>>>>>> short >> >>>>>> >> >>>>>>>>> article. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> Yes, please do make it into a wiki page on the GHC >> Trac, >> >>>>>> >> >>>>>>>>> and >> >>>>>> >> >>>>>>>>> maybe >> >>>>>> >> >>>>>>>>> make a ticket for it. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> Thanks >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> Simon >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> From: Edward Kmett [mailto:ekmett at gmail.com] >> >>>>>> >> >>>>>>>>> Sent: 27 August 2015 16:54 >> >>>>>> >> >>>>>>>>> To: Simon Peyton Jones >> >>>>>> >> >>>>>>>>> Cc: Manuel M T Chakravarty; Simon Marlow; ghc-devs >> >>>>>> >> >>>>>>>>> Subject: Re: ArrayArrays >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> An ArrayArray# is just an Array# with a modified >> >>>>>> >> >>>>>>>>> invariant. It >> >>>>>> >> >>>>>>>>> points directly to other unlifted ArrayArray#'s or >> >>>>>> >> >>>>>>>>> ByteArray#'s. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> While those live in #, they are garbage collected >> >>>>>> >> >>>>>>>>> objects, so >> >>>>>> >> >>>>>>>>> this >> >>>>>> >> >>>>>>>>> all lives on the heap. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> They were added to make some of the DPH stuff fast >> when >> >>>>>> >> >>>>>>>>> it has >> >>>>>> >> >>>>>>>>> to >> >>>>>> >> >>>>>>>>> deal with nested arrays. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> I'm currently abusing them as a placeholder for a >> better >> >>>>>> >> >>>>>>>>> thing. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> The Problem >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> ----------------- >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> Consider the scenario where you write a classic >> >>>>>> >> >>>>>>>>> doubly-linked >> >>>>>> >> >>>>>>>>> list >> >>>>>> >> >>>>>>>>> in Haskell. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> Chasing from one DLL to the next requires following 3 >> >>>>>> >> >>>>>>>>> pointers >> >>>>>> >> >>>>>>>>> on >> >>>>>> >> >>>>>>>>> the heap. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> DLL ~> IORef (Maybe DLL) ~> MutVar# RealWorld (Maybe >> DLL) >> >>>>>> >> >>>>>>>>> ~> >> >>>>>> >> >>>>>>>>> Maybe >> >>>>>> >> >>>>>>>>> DLL ~> DLL >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> That is 3 levels of indirection. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> We can trim one by simply unpacking the IORef with >> >>>>>> >> >>>>>>>>> -funbox-strict-fields or UNPACK >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> We can trim another by adding a 'Nil' constructor for >> DLL >> >>>>>> >> >>>>>>>>> and >> >>>>>> >> >>>>>>>>> worsening our representation. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> but now we're still stuck with a level of indirection >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> DLL ~> MutVar# RealWorld DLL ~> DLL >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> This means that every operation we perform on this >> >>>>>> >> >>>>>>>>> structure >> >>>>>> >> >>>>>>>>> will >> >>>>>> >> >>>>>>>>> be about half of the speed of an implementation in >> most >> >>>>>> >> >>>>>>>>> other >> >>>>>> >> >>>>>>>>> languages >> >>>>>> >> >>>>>>>>> assuming we're memory bound on loading things into >> cache! >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> Making Progress >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> ---------------------- >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> I have been working on a number of data structures >> where >> >>>>>> >> >>>>>>>>> the >> >>>>>> >> >>>>>>>>> indirection of going from something in * out to an >> object >> >>>>>> >> >>>>>>>>> in # >> >>>>>> >> >>>>>>>>> which >> >>>>>> >> >>>>>>>>> contains the real pointer to my target and coming back >> >>>>>> >> >>>>>>>>> effectively doubles >> >>>>>> >> >>>>>>>>> my runtime. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> We go out to the MutVar# because we are allowed to put >> >>>>>> >> >>>>>>>>> the >> >>>>>> >> >>>>>>>>> MutVar# >> >>>>>> >> >>>>>>>>> onto the mutable list when we dirty it. There is a >> well >> >>>>>> >> >>>>>>>>> defined >> >>>>>> >> >>>>>>>>> write-barrier. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> I could change out the representation to use >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> data DLL = DLL (MutableArray# RealWorld DLL) | Nil >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> I can just store two pointers in the MutableArray# >> every >> >>>>>> >> >>>>>>>>> time, >> >>>>>> >> >>>>>>>>> but >> >>>>>> >> >>>>>>>>> this doesn't help _much_ directly. It has reduced the >> >>>>>> >> >>>>>>>>> amount of >> >>>>>> >> >>>>>>>>> distinct >> >>>>>> >> >>>>>>>>> addresses in memory I touch on a walk of the DLL from >> 3 >> >>>>>> >> >>>>>>>>> per >> >>>>>> >> >>>>>>>>> object to 2. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> I still have to go out to the heap from my DLL and >> get to >> >>>>>> >> >>>>>>>>> the >> >>>>>> >> >>>>>>>>> array >> >>>>>> >> >>>>>>>>> object and then chase it to the next DLL and chase >> that >> >>>>>> >> >>>>>>>>> to the >> >>>>>> >> >>>>>>>>> next array. I >> >>>>>> >> >>>>>>>>> do get my two pointers together in memory though. I'm >> >>>>>> >> >>>>>>>>> paying for >> >>>>>> >> >>>>>>>>> a card >> >>>>>> >> >>>>>>>>> marking table as well, which I don't particularly need >> >>>>>> >> >>>>>>>>> with just >> >>>>>> >> >>>>>>>>> two >> >>>>>> >> >>>>>>>>> pointers, but we can shed that with the >> >>>>>> >> >>>>>>>>> "SmallMutableArray#" >> >>>>>> >> >>>>>>>>> machinery added >> >>>>>> >> >>>>>>>>> back in 7.10, which is just the old array code a a new >> >>>>>> >> >>>>>>>>> data >> >>>>>> >> >>>>>>>>> type, which can >> >>>>>> >> >>>>>>>>> speed things up a bit when you don't have very big >> >>>>>> >> >>>>>>>>> arrays: >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> data DLL = DLL (SmallMutableArray# RealWorld DLL) | >> Nil >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> But what if I wanted my object itself to live in # and >> >>>>>> >> >>>>>>>>> have two >> >>>>>> >> >>>>>>>>> mutable fields and be able to share the sme write >> >>>>>> >> >>>>>>>>> barrier? >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> An ArrayArray# points directly to other unlifted array >> >>>>>> >> >>>>>>>>> types. >> >>>>>> >> >>>>>>>>> What >> >>>>>> >> >>>>>>>>> if we have one # -> * wrapper on the outside to deal >> with >> >>>>>> >> >>>>>>>>> the >> >>>>>> >> >>>>>>>>> impedence >> >>>>>> >> >>>>>>>>> mismatch between the imperative world and Haskell, and >> >>>>>> >> >>>>>>>>> then just >> >>>>>> >> >>>>>>>>> let the >> >>>>>> >> >>>>>>>>> ArrayArray#'s hold other arrayarrays. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> data DLL = DLL (MutableArrayArray# RealWorld) >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> now I need to make up a new Nil, which I can just >> make be >> >>>>>> >> >>>>>>>>> a >> >>>>>> >> >>>>>>>>> special >> >>>>>> >> >>>>>>>>> MutableArrayArray# I allocate on program startup. I >> can >> >>>>>> >> >>>>>>>>> even >> >>>>>> >> >>>>>>>>> abuse pattern >> >>>>>> >> >>>>>>>>> synonyms. Alternately I can exploit the internals >> further >> >>>>>> >> >>>>>>>>> to >> >>>>>> >> >>>>>>>>> make this >> >>>>>> >> >>>>>>>>> cheaper. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> Then I can use the readMutableArrayArray# and >> >>>>>> >> >>>>>>>>> writeMutableArrayArray# calls to directly access the >> >>>>>> >> >>>>>>>>> preceding >> >>>>>> >> >>>>>>>>> and next >> >>>>>> >> >>>>>>>>> entry in the linked list. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> So now we have one DLL wrapper which just 'bootstraps >> me' >> >>>>>> >> >>>>>>>>> into a >> >>>>>> >> >>>>>>>>> strict world, and everything there lives in #. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> next :: DLL -> IO DLL >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> next (DLL m) = IO $ \s -> case readMutableArrayArray# >> s >> >>>>>> >> >>>>>>>>> of >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> (# s', n #) -> (# s', DLL n #) >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> It turns out GHC is quite happy to optimize all of >> that >> >>>>>> >> >>>>>>>>> code to >> >>>>>> >> >>>>>>>>> keep things unboxed. The 'DLL' wrappers get removed >> >>>>>> >> >>>>>>>>> pretty >> >>>>>> >> >>>>>>>>> easily when they >> >>>>>> >> >>>>>>>>> are known strict and you chain operations of this >> sort! >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> Cleaning it Up >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> ------------------ >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> Now I have one outermost indirection pointing to an >> array >> >>>>>> >> >>>>>>>>> that >> >>>>>> >> >>>>>>>>> points directly to other arrays. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> I'm stuck paying for a card marking table per object, >> but >> >>>>>> >> >>>>>>>>> I can >> >>>>>> >> >>>>>>>>> fix >> >>>>>> >> >>>>>>>>> that by duplicating the code for MutableArrayArray# >> and >> >>>>>> >> >>>>>>>>> using a >> >>>>>> >> >>>>>>>>> SmallMutableArray#. I can hack up primops that let me >> >>>>>> >> >>>>>>>>> store a >> >>>>>> >> >>>>>>>>> mixture of >> >>>>>> >> >>>>>>>>> SmallMutableArray# fields and normal ones in the data >> >>>>>> >> >>>>>>>>> structure. >> >>>>>> >> >>>>>>>>> Operationally, I can even do so by just unsafeCoercing >> >>>>>> >> >>>>>>>>> the >> >>>>>> >> >>>>>>>>> existing >> >>>>>> >> >>>>>>>>> SmallMutableArray# primitives to change the kind of >> one >> >>>>>> >> >>>>>>>>> of the >> >>>>>> >> >>>>>>>>> arguments it >> >>>>>> >> >>>>>>>>> takes. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> This is almost ideal, but not quite. I often have >> fields >> >>>>>> >> >>>>>>>>> that >> >>>>>> >> >>>>>>>>> would >> >>>>>> >> >>>>>>>>> be best left unboxed. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> data DLLInt = DLL !Int !(IORef DLL) !(IORef DLL) | Nil >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> was able to unpack the Int, but we lost that. We can >> >>>>>> >> >>>>>>>>> currently >> >>>>>> >> >>>>>>>>> at >> >>>>>> >> >>>>>>>>> best point one of the entries of the >> SmallMutableArray# >> >>>>>> >> >>>>>>>>> at a >> >>>>>> >> >>>>>>>>> boxed or at a >> >>>>>> >> >>>>>>>>> MutableByteArray# for all of our misc. data and shove >> the >> >>>>>> >> >>>>>>>>> int in >> >>>>>> >> >>>>>>>>> question in >> >>>>>> >> >>>>>>>>> there. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> e.g. if I were to implement a hash-array-mapped-trie I >> >>>>>> >> >>>>>>>>> need to >> >>>>>> >> >>>>>>>>> store masks and administrivia as I walk down the tree. >> >>>>>> >> >>>>>>>>> Having to >> >>>>>> >> >>>>>>>>> go off to >> >>>>>> >> >>>>>>>>> the side costs me the entire win from avoiding the >> first >> >>>>>> >> >>>>>>>>> pointer >> >>>>>> >> >>>>>>>>> chase. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> But, if like Ryan suggested, we had a heap object we >> >>>>>> >> >>>>>>>>> could >> >>>>>> >> >>>>>>>>> construct that had n words with unsafe access and m >> >>>>>> >> >>>>>>>>> pointers to >> >>>>>> >> >>>>>>>>> other heap >> >>>>>> >> >>>>>>>>> objects, one that could put itself on the mutable list >> >>>>>> >> >>>>>>>>> when any >> >>>>>> >> >>>>>>>>> of those >> >>>>>> >> >>>>>>>>> pointers changed then I could shed this last factor of >> >>>>>> >> >>>>>>>>> two in >> >>>>>> >> >>>>>>>>> all >> >>>>>> >> >>>>>>>>> circumstances. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> Prototype >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> ------------- >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> Over the last few days I've put together a small >> >>>>>> >> >>>>>>>>> prototype >> >>>>>> >> >>>>>>>>> implementation with a few non-trivial imperative data >> >>>>>> >> >>>>>>>>> structures >> >>>>>> >> >>>>>>>>> for things >> >>>>>> >> >>>>>>>>> like Tarjan's link-cut trees, the list labeling >> problem >> >>>>>> >> >>>>>>>>> and >> >>>>>> >> >>>>>>>>> order-maintenance. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> https://github.com/ekmett/structs >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> Notable bits: >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> Data.Struct.Internal.LinkCut provides an >> implementation >> >>>>>> >> >>>>>>>>> of >> >>>>>> >> >>>>>>>>> link-cut >> >>>>>> >> >>>>>>>>> trees in this style. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> Data.Struct.Internal provides the rather horrifying >> guts >> >>>>>> >> >>>>>>>>> that >> >>>>>> >> >>>>>>>>> make >> >>>>>> >> >>>>>>>>> it go fast. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> Once compiled with -O or -O2, if you look at the core, >> >>>>>> >> >>>>>>>>> almost >> >>>>>> >> >>>>>>>>> all >> >>>>>> >> >>>>>>>>> the references to the LinkCut or Object data >> constructor >> >>>>>> >> >>>>>>>>> get >> >>>>>> >> >>>>>>>>> optimized away, >> >>>>>> >> >>>>>>>>> and we're left with beautiful strict code directly >> >>>>>> >> >>>>>>>>> mutating out >> >>>>>> >> >>>>>>>>> underlying >> >>>>>> >> >>>>>>>>> representation. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> At the very least I'll take this email and turn it >> into a >> >>>>>> >> >>>>>>>>> short >> >>>>>> >> >>>>>>>>> article. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> -Edward >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> On Thu, Aug 27, 2015 at 9:00 AM, Simon Peyton Jones >> >>>>>> >> >>>>>>>>> wrote: >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> Just to say that I have no idea what is going on in >> this >> >>>>>> >> >>>>>>>>> thread. >> >>>>>> >> >>>>>>>>> What is ArrayArray? What is the issue in general? Is >> >>>>>> >> >>>>>>>>> there a >> >>>>>> >> >>>>>>>>> ticket? Is >> >>>>>> >> >>>>>>>>> there a wiki page? >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> If it?s important, an ab-initio wiki page + ticket >> would >> >>>>>> >> >>>>>>>>> be a >> >>>>>> >> >>>>>>>>> good >> >>>>>> >> >>>>>>>>> thing. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> Simon >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] >> On >> >>>>>> >> >>>>>>>>> Behalf >> >>>>>> >> >>>>>>>>> Of >> >>>>>> >> >>>>>>>>> Edward Kmett >> >>>>>> >> >>>>>>>>> Sent: 21 August 2015 05:25 >> >>>>>> >> >>>>>>>>> To: Manuel M T Chakravarty >> >>>>>> >> >>>>>>>>> Cc: Simon Marlow; ghc-devs >> >>>>>> >> >>>>>>>>> Subject: Re: ArrayArrays >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> When (ab)using them for this purpose, >> SmallArrayArray's >> >>>>>> >> >>>>>>>>> would be >> >>>>>> >> >>>>>>>>> very handy as well. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> Consider right now if I have something like an >> >>>>>> >> >>>>>>>>> order-maintenance >> >>>>>> >> >>>>>>>>> structure I have: >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> data Upper s = Upper {-# UNPACK #-} >> !(MutableByteArray s) >> >>>>>> >> >>>>>>>>> {-# >> >>>>>> >> >>>>>>>>> UNPACK #-} !(MutVar s (Upper s)) {-# UNPACK #-} >> !(MutVar >> >>>>>> >> >>>>>>>>> s >> >>>>>> >> >>>>>>>>> (Upper s)) >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> data Lower s = Lower {-# UNPACK #-} !(MutVar s (Upper >> s)) >> >>>>>> >> >>>>>>>>> {-# >> >>>>>> >> >>>>>>>>> UNPACK #-} !(MutableByteArray s) {-# UNPACK #-} >> !(MutVar >> >>>>>> >> >>>>>>>>> s >> >>>>>> >> >>>>>>>>> (Lower s)) {-# >> >>>>>> >> >>>>>>>>> UNPACK #-} !(MutVar s (Lower s)) >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> The former contains, logically, a mutable integer and >> two >> >>>>>> >> >>>>>>>>> pointers, >> >>>>>> >> >>>>>>>>> one for forward and one for backwards. The latter is >> >>>>>> >> >>>>>>>>> basically >> >>>>>> >> >>>>>>>>> the same >> >>>>>> >> >>>>>>>>> thing with a mutable reference up pointing at the >> >>>>>> >> >>>>>>>>> structure >> >>>>>> >> >>>>>>>>> above. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> On the heap this is an object that points to a >> structure >> >>>>>> >> >>>>>>>>> for the >> >>>>>> >> >>>>>>>>> bytearray, and points to another structure for each >> >>>>>> >> >>>>>>>>> mutvar which >> >>>>>> >> >>>>>>>>> each point >> >>>>>> >> >>>>>>>>> to the other 'Upper' structure. So there is a level of >> >>>>>> >> >>>>>>>>> indirection smeared >> >>>>>> >> >>>>>>>>> over everything. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> So this is a pair of doubly linked lists with an >> upward >> >>>>>> >> >>>>>>>>> link >> >>>>>> >> >>>>>>>>> from >> >>>>>> >> >>>>>>>>> the structure below to the structure above. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> Converted into ArrayArray#s I'd get >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> data Upper s = Upper (MutableArrayArray# s) >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> w/ the first slot being a pointer to a >> MutableByteArray#, >> >>>>>> >> >>>>>>>>> and >> >>>>>> >> >>>>>>>>> the >> >>>>>> >> >>>>>>>>> next 2 slots pointing to the previous and next >> previous >> >>>>>> >> >>>>>>>>> objects, >> >>>>>> >> >>>>>>>>> represented >> >>>>>> >> >>>>>>>>> just as their MutableArrayArray#s. I can use >> >>>>>> >> >>>>>>>>> sameMutableArrayArray# on these >> >>>>>> >> >>>>>>>>> for object identity, which lets me check for the ends >> of >> >>>>>> >> >>>>>>>>> the >> >>>>>> >> >>>>>>>>> lists by tying >> >>>>>> >> >>>>>>>>> things back on themselves. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> and below that >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> data Lower s = Lower (MutableArrayArray# s) >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> is similar, with an extra MutableArrayArray slot >> pointing >> >>>>>> >> >>>>>>>>> up to >> >>>>>> >> >>>>>>>>> an >> >>>>>> >> >>>>>>>>> upper structure. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> I can then write a handful of combinators for getting >> out >> >>>>>> >> >>>>>>>>> the >> >>>>>> >> >>>>>>>>> slots >> >>>>>> >> >>>>>>>>> in question, while it has gained a level of >> indirection >> >>>>>> >> >>>>>>>>> between >> >>>>>> >> >>>>>>>>> the wrapper >> >>>>>> >> >>>>>>>>> to put it in * and the MutableArrayArray# s in #, that >> >>>>>> >> >>>>>>>>> one can >> >>>>>> >> >>>>>>>>> be basically >> >>>>>> >> >>>>>>>>> erased by ghc. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> Unlike before I don't have several separate objects on >> >>>>>> >> >>>>>>>>> the heap >> >>>>>> >> >>>>>>>>> for >> >>>>>> >> >>>>>>>>> each thing. I only have 2 now. The MutableArrayArray# >> for >> >>>>>> >> >>>>>>>>> the >> >>>>>> >> >>>>>>>>> object itself, >> >>>>>> >> >>>>>>>>> and the MutableByteArray# that it references to carry >> >>>>>> >> >>>>>>>>> around the >> >>>>>> >> >>>>>>>>> mutable >> >>>>>> >> >>>>>>>>> int. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> The only pain points are >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> 1.) the aforementioned limitation that currently >> prevents >> >>>>>> >> >>>>>>>>> me >> >>>>>> >> >>>>>>>>> from >> >>>>>> >> >>>>>>>>> stuffing normal boxed data through a SmallArray or >> Array >> >>>>>> >> >>>>>>>>> into an >> >>>>>> >> >>>>>>>>> ArrayArray >> >>>>>> >> >>>>>>>>> leaving me in a little ghetto disconnected from the >> rest >> >>>>>> >> >>>>>>>>> of >> >>>>>> >> >>>>>>>>> Haskell, >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> and >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> 2.) the lack of SmallArrayArray's, which could let us >> >>>>>> >> >>>>>>>>> avoid the >> >>>>>> >> >>>>>>>>> card marking overhead. These objects are all small, >> 3-4 >> >>>>>> >> >>>>>>>>> pointers >> >>>>>> >> >>>>>>>>> wide. Card >> >>>>>> >> >>>>>>>>> marking doesn't help. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> Alternately I could just try to do really evil things >> and >> >>>>>> >> >>>>>>>>> convert >> >>>>>> >> >>>>>>>>> the whole mess to SmallArrays and then figure out how >> to >> >>>>>> >> >>>>>>>>> unsafeCoerce my way >> >>>>>> >> >>>>>>>>> to glory, stuffing the #'d references to the other >> arrays >> >>>>>> >> >>>>>>>>> directly into the >> >>>>>> >> >>>>>>>>> SmallArray as slots, removing the limitation we see >> here >> >>>>>> >> >>>>>>>>> by >> >>>>>> >> >>>>>>>>> aping the >> >>>>>> >> >>>>>>>>> MutableArrayArray# s API, but that gets really really >> >>>>>> >> >>>>>>>>> dangerous! >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> I'm pretty much willing to sacrifice almost anything >> on >> >>>>>> >> >>>>>>>>> the >> >>>>>> >> >>>>>>>>> altar >> >>>>>> >> >>>>>>>>> of speed here, but I'd like to be able to let the GC >> move >> >>>>>> >> >>>>>>>>> them >> >>>>>> >> >>>>>>>>> and collect >> >>>>>> >> >>>>>>>>> them which rules out simpler Ptr and Addr based >> >>>>>> >> >>>>>>>>> solutions. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> -Edward >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> On Thu, Aug 20, 2015 at 9:01 PM, Manuel M T >> Chakravarty >> >>>>>> >> >>>>>>>>> wrote: >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> That?s an interesting idea. >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> Manuel >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> > Edward Kmett : >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> > >> >>>>>> >> >>>>>>>>> > Would it be possible to add unsafe primops to add >> >>>>>> >> >>>>>>>>> > Array# and >> >>>>>> >> >>>>>>>>> > SmallArray# entries to an ArrayArray#? The fact that >> >>>>>> >> >>>>>>>>> > the >> >>>>>> >> >>>>>>>>> > ArrayArray# entries >> >>>>>> >> >>>>>>>>> > are all directly unlifted avoiding a level of >> >>>>>> >> >>>>>>>>> > indirection for >> >>>>>> >> >>>>>>>>> > the containing >> >>>>>> >> >>>>>>>>> > structure is amazing, but I can only currently use >> it >> >>>>>> >> >>>>>>>>> > if my >> >>>>>> >> >>>>>>>>> > leaf level data >> >>>>>> >> >>>>>>>>> > can be 100% unboxed and distributed among >> ByteArray#s. >> >>>>>> >> >>>>>>>>> > It'd be >> >>>>>> >> >>>>>>>>> > nice to be >> >>>>>> >> >>>>>>>>> > able to have the ability to put SmallArray# a stuff >> >>>>>> >> >>>>>>>>> > down at >> >>>>>> >> >>>>>>>>> > the leaves to >> >>>>>> >> >>>>>>>>> > hold lifted contents. >> >>>>>> >> >>>>>>>>> > >> >>>>>> >> >>>>>>>>> > I accept fully that if I name the wrong type when I >> go >> >>>>>> >> >>>>>>>>> > to >> >>>>>> >> >>>>>>>>> > access >> >>>>>> >> >>>>>>>>> > one of the fields it'll lie to me, but I suppose >> it'd >> >>>>>> >> >>>>>>>>> > do that >> >>>>>> >> >>>>>>>>> > if i tried to >> >>>>>> >> >>>>>>>>> > use one of the members that held a nested >> ArrayArray# >> >>>>>> >> >>>>>>>>> > as a >> >>>>>> >> >>>>>>>>> > ByteArray# >> >>>>>> >> >>>>>>>>> > anyways, so it isn't like there is a safety story >> >>>>>> >> >>>>>>>>> > preventing >> >>>>>> >> >>>>>>>>> > this. >> >>>>>> >> >>>>>>>>> > >> >>>>>> >> >>>>>>>>> > I've been hunting for ways to try to kill the >> >>>>>> >> >>>>>>>>> > indirection >> >>>>>> >> >>>>>>>>> > problems I get with Haskell and mutable structures, >> and >> >>>>>> >> >>>>>>>>> > I >> >>>>>> >> >>>>>>>>> > could shoehorn a >> >>>>>> >> >>>>>>>>> > number of them into ArrayArrays if this worked. >> >>>>>> >> >>>>>>>>> > >> >>>>>> >> >>>>>>>>> > Right now I'm stuck paying for 2 or 3 levels of >> >>>>>> >> >>>>>>>>> > unnecessary >> >>>>>> >> >>>>>>>>> > indirection compared to c/java and this could reduce >> >>>>>> >> >>>>>>>>> > that pain >> >>>>>> >> >>>>>>>>> > to just 1 >> >>>>>> >> >>>>>>>>> > level of unnecessary indirection. >> >>>>>> >> >>>>>>>>> > >> >>>>>> >> >>>>>>>>> > -Edward >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> > _______________________________________________ >> >>>>>> >> >>>>>>>>> > ghc-devs mailing list >> >>>>>> >> >>>>>>>>> > ghc-devs at haskell.org >> >>>>>> >> >>>>>>>>> > >> >>>>>> >> >>>>>>>>> > >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>>>>>> _______________________________________________ >> >>>>>> >> >>>>>>>>> ghc-devs mailing list >> >>>>>> >> >>>>>>>>> ghc-devs at haskell.org >> >>>>>> >> >>>>>>>>> >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> >>>>>> >> >>>>>>> >> >>>>>> >> >>>>>>> >> >>>>>> >> >>>>> >> >>>>>> >> >>> >> >>>>>> >> >> >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> >> > _______________________________________________ >> >>>>>> >> > ghc-devs mailing list >> >>>>>> >> > ghc-devs at haskell.org >> >>>>>> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> >>>>>> >> > >> >>>>>> > >> >>>>>> > >> >>>>> >> >>>>> >> >>>> >> >>>> >> >>>> _______________________________________________ >> >>>> ghc-devs mailing list >> >>>> ghc-devs at haskell.org >> >>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> >>>> >> >>> >> >> >> > >> > _______________________________________________ >> > ghc-devs mailing list >> > ghc-devs at haskell.org >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael at snoyman.com Tue Sep 1 06:53:00 2015 From: michael at snoyman.com (Michael Snoyman) Date: Tue, 1 Sep 2015 09:53:00 +0300 Subject: more releases In-Reply-To: <3E39E8B5-89C2-40F6-9180-C6D73AF3926F@cis.upenn.edu> References: <3E39E8B5-89C2-40F6-9180-C6D73AF3926F@cis.upenn.edu> Message-ID: It's definitely an interesting idea. From the Stackage side: I'm happy to provide testing and, even better, support to get some automated Stackage testing tied into the GHC release process. (Why not be more aggressive? We could do some CI against Stackage from the 7.10 branch on a regular basis.) I like the idea of getting bug fixes out to users more frequently, so I'm definitely +1 on the discussion. Let me play devil's advocate though: having a large number of versions of GHC out there can make it difficult for library authors, package curators, and large open source projects, due to variety of what people are using. If we end up in a world where virtually everyone ends up on the latest point release in a short timeframe, the problem is reduced, but most of our current installation methods are not amenable to that. We need to have a serious discussion about how Linux distros, Haskell Platform, minimal installers, and so on would address this shift. (stack would be able to adapt to this easily since it can download new GHCs as needed, but users may not like having 100MB installs on a daily basis ;).) What I would love to see is that bug fixes are regularly backported to the stable GHC release and that within a reasonable timeframe are released, where reasonable is some value we can discuss and come to consensus on. I'll say that at the extremes: I think a week is far too short, and a year is far too long. On Tue, Sep 1, 2015 at 9:45 AM, Richard Eisenberg wrote: > Hi devs, > > An interesting topic came up over dinner tonight: what if GHC made more > releases? As an extreme example, we could release a new point version every > time a bug fix gets merged to the stable branch. This may be a terrible > idea. But what's stopping us from doing so? > > The biggest objection I can see is that we would want to make sure that > users' code would work with the new version. Could the Stackage crew help > us with this? If they run their nightly build with a release candidate and > diff against the prior results, we would get a pretty accurate sense of > whether the bugfix is good. If this test succeeds, why not release? Would > it be hard to automate the packaging/posting process? > > The advantage to more releases is that it gets bugfixes in more hands > sooner. What are the disadvantages? > > Richard > > PS: I'm not 100% sold on this idea. But I thought it was interesting > enough to raise a broader discussion. > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hvriedel at gmail.com Tue Sep 1 07:01:55 2015 From: hvriedel at gmail.com (Herbert Valerio Riedel) Date: Tue, 01 Sep 2015 09:01:55 +0200 Subject: more releases In-Reply-To: <3E39E8B5-89C2-40F6-9180-C6D73AF3926F@cis.upenn.edu> (Richard Eisenberg's message of "Mon, 31 Aug 2015 23:45:40 -0700") References: <3E39E8B5-89C2-40F6-9180-C6D73AF3926F@cis.upenn.edu> Message-ID: <87si6y1v30.fsf@gmail.com> On 2015-09-01 at 08:45:40 +0200, Richard Eisenberg wrote: > An interesting topic came up over dinner tonight: what if GHC made > more releases? As an extreme example, we could release a new point > version every time a bug fix gets merged to the stable branch. This > may be a terrible idea. But what's stopping us from doing so? > > The biggest objection I can see is that we would want to make sure > that users' code would work with the new version. Could the Stackage > crew help us with this? If they run their nightly build with a release > candidate and diff against the prior results, we would get a pretty > accurate sense of whether the bugfix is good. If this test succeeds, > why not release? Would it be hard to automate the packaging/posting > process? > > The advantage to more releases is that it gets bugfixes in more hands > sooner. What are the disadvantages? I'd say mostly organisational overhead which can't be fully automated (afaik, Ben has already automated large parts but not everything can be): - Coordinating with people creating and testing the bindists - Writing releases notes & announcment - Coordinating with the HP release process (which requires separate QA) - If bundled core-libraries are affected, coordination overhead with package maintainers (unless GHC HQ owned), verifying version bumps (API diff!) and changelogs have been updated accordingly, uploading to Hackage - Uploading and signing packagees to download.haskell.org, and verifying the downloads Austin & Ben probably have more to add to this list That said, doing more stable point releases is certainly doable if the bugs fixed are critical enough. This is mostly a trade-off between time spent on getting GHC HEAD in shape for the next major release (whose release-schedules suffer from time delays anyway) vs. maintaining a stable branch. Cheers, hvr From eir at cis.upenn.edu Tue Sep 1 07:12:21 2015 From: eir at cis.upenn.edu (Richard Eisenberg) Date: Tue, 1 Sep 2015 00:12:21 -0700 Subject: more releases In-Reply-To: <87si6y1v30.fsf@gmail.com> References: <3E39E8B5-89C2-40F6-9180-C6D73AF3926F@cis.upenn.edu> <87si6y1v30.fsf@gmail.com> Message-ID: On Sep 1, 2015, at 12:01 AM, Herbert Valerio Riedel wrote: > I'd say mostly organisational overhead which can't be fully automated > (afaik, Ben has already automated large parts but not everything can be): > > - Coordinating with people creating and testing the bindists This was the sort of thing I thought could be automated. I'm picturing a system where Austin/Ben hits a button and everything whirs to life, creating, testing, and posting bindists, with no people involved. > - Writing releases notes & announcment Release notes should, theoretically, be updated with the patches. Announcement can be automated. > - Coordinating with the HP release process (which requires separate QA) I'm sure others will have opinions here, but I guess I was thinking that the HP wouldn't be involved. These tiny releases could even be called something like "7.10.2 build 18". The HP would get updated only when we go to 7.10.3. Maybe we even have a binary compatibility requirement between tiny releases -- no interface file changes! Then a user's package library doesn't have to be recompiled when updating. In theory, other than the bugfixes, two people with different "builds" of GHC should have the same experience. > - If bundled core-libraries are affected, coordination overhead with package > maintainers (unless GHC HQ owned), verifying version bumps (API diff!) and > changelogs have been updated accordingly, uploading to Hackage Any library version change would require a more proper release. Do these libraries tend to change during a major release cycle? > - Uploading and signing packagees to download.haskell.org, and verifying > the downloads This isn't automated? > > Austin & Ben probably have more to add to this list > I'm sure they do. Again, I'd be fine if the answer from the community is "it's just not what we need". But I wanted to see if there were technical/practical/social reasons why this was or wasn't a good idea. If we do think it's a good idea absent those reasons, then we can work on addressing those concerns. Richard > That said, doing more stable point releases is certainly doable if the > bugs fixed are critical enough. This is mostly a trade-off between time > spent on getting GHC HEAD in shape for the next major release (whose > release-schedules suffer from time delays anyway) vs. maintaining a > stable branch. > > Cheers, > hvr From simonpj at microsoft.com Tue Sep 1 11:50:05 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Tue, 1 Sep 2015 11:50:05 +0000 Subject: ArrayArrays In-Reply-To: References: <4DACFC45-0E7E-4B3F-8435-5365EC3F7749@cse.unsw.edu.au> <65158505c7be41afad85374d246b7350@DB4PR30MB030.064d.mgd.msft.net> <2FCB6298-A4FF-4F7B-8BF8-4880BB3154AB@gmail.com> Message-ID: <107de3fcc21b4ccab7a14cc908cdb110@AM3PR30MB019.064d.mgd.msft.net> OK Tuesday afternoon break! S From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Johan Tibell Sent: 01 September 2015 06:14 To: Ryan Yates Cc: Simon Marlow; Manuel M T Chakravarty; Chao-Hong Chen; ghc-devs; Ryan Scott; Ryan Yates Subject: Re: ArrayArrays Works for me. On Mon, Aug 31, 2015 at 3:50 PM, Ryan Yates > wrote: Any time works for me. Ryan On Mon, Aug 31, 2015 at 6:11 PM, Ryan Newton > wrote: > Dear Edward, Ryan Yates, and other interested parties -- > > So when should we meet up about this? > > May I propose the Tues afternoon break for everyone at ICFP who is > interested in this topic? We can meet out in the coffee area and congregate > around Edward Kmett, who is tall and should be easy to find ;-). > > I think Ryan is going to show us how to use his new primops for combined > array + other fields in one heap object? > > On Sat, Aug 29, 2015 at 9:24 PM Edward Kmett > wrote: >> >> Without a custom primitive it doesn't help much there, you have to store >> the indirection to the mask. >> >> With a custom primitive it should cut the on heap root-to-leaf path of >> everything in the HAMT in half. A shorter HashMap was actually one of the >> motivating factors for me doing this. It is rather astoundingly difficult to >> beat the performance of HashMap, so I had to start cheating pretty badly. ;) >> >> -Edward >> >> On Sat, Aug 29, 2015 at 5:45 PM, Johan Tibell > >> wrote: >>> >>> I'd also be interested to chat at ICFP to see if I can use this for my >>> HAMT implementation. >>> >>> On Sat, Aug 29, 2015 at 3:07 PM, Edward Kmett > wrote: >>>> >>>> Sounds good to me. Right now I'm just hacking up composable accessors >>>> for "typed slots" in a fairly lens-like fashion, and treating the set of >>>> slots I define and the 'new' function I build for the data type as its API, >>>> and build atop that. This could eventually graduate to template-haskell, but >>>> I'm not entirely satisfied with the solution I have. I currently distinguish >>>> between what I'm calling "slots" (things that point directly to another >>>> SmallMutableArrayArray# sans wrapper) and "fields" which point directly to >>>> the usual Haskell data types because unifying the two notions meant that I >>>> couldn't lift some coercions out "far enough" to make them vanish. >>>> >>>> I'll be happy to run through my current working set of issues in person >>>> and -- as things get nailed down further -- in a longer lived medium than in >>>> personal conversations. ;) >>>> >>>> -Edward >>>> >>>> On Sat, Aug 29, 2015 at 7:59 AM, Ryan Newton > wrote: >>>>> >>>>> I'd also love to meet up at ICFP and discuss this. I think the array >>>>> primops plus a TH layer that lets (ab)use them many times without too much >>>>> marginal cost sounds great. And I'd like to learn how we could be either >>>>> early users of, or help with, this infrastructure. >>>>> >>>>> CC'ing in Ryan Scot and Omer Agacan who may also be interested in >>>>> dropping in on such discussions @ICFP, and Chao-Hong Chen, a Ph.D. student >>>>> who is currently working on concurrent data structures in Haskell, but will >>>>> not be at ICFP. >>>>> >>>>> >>>>> On Fri, Aug 28, 2015 at 7:47 PM, Ryan Yates > >>>>> wrote: >>>>>> >>>>>> I completely agree. I would love to spend some time during ICFP and >>>>>> friends talking about what it could look like. My small array for STM >>>>>> changes for the RTS can be seen here [1]. It is on a branch somewhere >>>>>> between 7.8 and 7.10 and includes irrelevant STM bits and some >>>>>> confusing naming choices (sorry), but should cover all the details >>>>>> needed to implement it for a non-STM context. The biggest surprise >>>>>> for me was following small array too closely and having a word/byte >>>>>> offset miss-match [2]. >>>>>> >>>>>> [1]: >>>>>> https://github.com/fryguybob/ghc/compare/ghc-htm-bloom...fryguybob:ghc-htm-mut >>>>>> [2]: https://ghc.haskell.org/trac/ghc/ticket/10413 >>>>>> >>>>>> Ryan >>>>>> >>>>>> On Fri, Aug 28, 2015 at 10:09 PM, Edward Kmett > >>>>>> wrote: >>>>>> > I'd love to have that last 10%, but its a lot of work to get there >>>>>> > and more >>>>>> > importantly I don't know quite what it should look like. >>>>>> > >>>>>> > On the other hand, I do have a pretty good idea of how the >>>>>> > primitives above >>>>>> > could be banged out and tested in a long evening, well in time for >>>>>> > 7.12. And >>>>>> > as noted earlier, those remain useful even if a nicer typed version >>>>>> > with an >>>>>> > extra level of indirection to the sizes is built up after. >>>>>> > >>>>>> > The rest sounds like a good graduate student project for someone who >>>>>> > has >>>>>> > graduate students lying around. Maybe somebody at Indiana University >>>>>> > who has >>>>>> > an interest in type theory and parallelism can find us one. =) >>>>>> > >>>>>> > -Edward >>>>>> > >>>>>> > On Fri, Aug 28, 2015 at 8:48 PM, Ryan Yates > >>>>>> > wrote: >>>>>> >> >>>>>> >> I think from my perspective, the motivation for getting the type >>>>>> >> checker involved is primarily bringing this to the level where >>>>>> >> users >>>>>> >> could be expected to build these structures. it is reasonable to >>>>>> >> think that there are people who want to use STM (a context with >>>>>> >> mutation already) to implement a straight forward data structure >>>>>> >> that >>>>>> >> avoids extra indirection penalty. There should be some places >>>>>> >> where >>>>>> >> knowing that things are field accesses rather then array indexing >>>>>> >> could be helpful, but I think GHC is good right now about handling >>>>>> >> constant offsets. In my code I don't do any bounds checking as I >>>>>> >> know >>>>>> >> I will only be accessing my arrays with constant indexes. I make >>>>>> >> wrappers for each field access and leave all the unsafe stuff in >>>>>> >> there. When things go wrong though, the compiler is no help. >>>>>> >> Maybe >>>>>> >> template Haskell that generates the appropriate wrappers is the >>>>>> >> right >>>>>> >> direction to go. >>>>>> >> There is another benefit for me when working with these as arrays >>>>>> >> in >>>>>> >> that it is quite simple and direct (given the hoops already jumped >>>>>> >> through) to play with alignment. I can ensure two pointers are >>>>>> >> never >>>>>> >> on the same cache-line by just spacing things out in the array. >>>>>> >> >>>>>> >> On Fri, Aug 28, 2015 at 7:33 PM, Edward Kmett > >>>>>> >> wrote: >>>>>> >> > They just segfault at this level. ;) >>>>>> >> > >>>>>> >> > Sent from my iPhone >>>>>> >> > >>>>>> >> > On Aug 28, 2015, at 7:25 PM, Ryan Newton > >>>>>> >> > wrote: >>>>>> >> > >>>>>> >> > You presumably also save a bounds check on reads by hard-coding >>>>>> >> > the >>>>>> >> > sizes? >>>>>> >> > >>>>>> >> > On Fri, Aug 28, 2015 at 3:39 PM, Edward Kmett > >>>>>> >> > wrote: >>>>>> >> >> >>>>>> >> >> Also there are 4 different "things" here, basically depending on >>>>>> >> >> two >>>>>> >> >> independent questions: >>>>>> >> >> >>>>>> >> >> a.) if you want to shove the sizes into the info table, and >>>>>> >> >> b.) if you want cardmarking. >>>>>> >> >> >>>>>> >> >> Versions with/without cardmarking for different sizes can be >>>>>> >> >> done >>>>>> >> >> pretty >>>>>> >> >> easily, but as noted, the infotable variants are pretty >>>>>> >> >> invasive. >>>>>> >> >> >>>>>> >> >> -Edward >>>>>> >> >> >>>>>> >> >> On Fri, Aug 28, 2015 at 6:36 PM, Edward Kmett > >>>>>> >> >> wrote: >>>>>> >> >>> >>>>>> >> >>> Well, on the plus side you'd save 16 bytes per object, which >>>>>> >> >>> adds up >>>>>> >> >>> if >>>>>> >> >>> they were small enough and there are enough of them. You get a >>>>>> >> >>> bit >>>>>> >> >>> better >>>>>> >> >>> locality of reference in terms of what fits in the first cache >>>>>> >> >>> line of >>>>>> >> >>> them. >>>>>> >> >>> >>>>>> >> >>> -Edward >>>>>> >> >>> >>>>>> >> >>> On Fri, Aug 28, 2015 at 6:14 PM, Ryan Newton >>>>>> >> >>> > >>>>>> >> >>> wrote: >>>>>> >> >>>> >>>>>> >> >>>> Yes. And for the short term I can imagine places we will >>>>>> >> >>>> settle with >>>>>> >> >>>> arrays even if it means tracking lengths unnecessarily and >>>>>> >> >>>> unsafeCoercing >>>>>> >> >>>> pointers whose types don't actually match their siblings. >>>>>> >> >>>> >>>>>> >> >>>> Is there anything to recommend the hacks mentioned for fixed >>>>>> >> >>>> sized >>>>>> >> >>>> array >>>>>> >> >>>> objects *other* than using them to fake structs? (Much to >>>>>> >> >>>> derecommend, as >>>>>> >> >>>> you mentioned!) >>>>>> >> >>>> >>>>>> >> >>>> On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett >>>>>> >> >>>> > >>>>>> >> >>>> wrote: >>>>>> >> >>>>> >>>>>> >> >>>>> I think both are useful, but the one you suggest requires a >>>>>> >> >>>>> lot more >>>>>> >> >>>>> plumbing and doesn't subsume all of the usecases of the >>>>>> >> >>>>> other. >>>>>> >> >>>>> >>>>>> >> >>>>> -Edward >>>>>> >> >>>>> >>>>>> >> >>>>> On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton >>>>>> >> >>>>> > >>>>>> >> >>>>> wrote: >>>>>> >> >>>>>> >>>>>> >> >>>>>> So that primitive is an array like thing (Same pointed type, >>>>>> >> >>>>>> unbounded >>>>>> >> >>>>>> length) with extra payload. >>>>>> >> >>>>>> >>>>>> >> >>>>>> I can see how we can do without structs if we have arrays, >>>>>> >> >>>>>> especially >>>>>> >> >>>>>> with the extra payload at front. But wouldn't the general >>>>>> >> >>>>>> solution >>>>>> >> >>>>>> for >>>>>> >> >>>>>> structs be one that that allows new user data type defs for >>>>>> >> >>>>>> # >>>>>> >> >>>>>> types? >>>>>> >> >>>>>> >>>>>> >> >>>>>> >>>>>> >> >>>>>> >>>>>> >> >>>>>> On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett >>>>>> >> >>>>>> > >>>>>> >> >>>>>> wrote: >>>>>> >> >>>>>>> >>>>>> >> >>>>>>> Some form of MutableStruct# with a known number of words >>>>>> >> >>>>>>> and a >>>>>> >> >>>>>>> known >>>>>> >> >>>>>>> number of pointers is basically what Ryan Yates was >>>>>> >> >>>>>>> suggesting >>>>>> >> >>>>>>> above, but >>>>>> >> >>>>>>> where the word counts were stored in the objects >>>>>> >> >>>>>>> themselves. >>>>>> >> >>>>>>> >>>>>> >> >>>>>>> Given that it'd have a couple of words for those counts >>>>>> >> >>>>>>> it'd >>>>>> >> >>>>>>> likely >>>>>> >> >>>>>>> want to be something we build in addition to MutVar# rather >>>>>> >> >>>>>>> than a >>>>>> >> >>>>>>> replacement. >>>>>> >> >>>>>>> >>>>>> >> >>>>>>> On the other hand, if we had to fix those numbers and build >>>>>> >> >>>>>>> info >>>>>> >> >>>>>>> tables that knew them, and typechecker support, for >>>>>> >> >>>>>>> instance, it'd >>>>>> >> >>>>>>> get >>>>>> >> >>>>>>> rather invasive. >>>>>> >> >>>>>>> >>>>>> >> >>>>>>> Also, a number of things that we can do with the 'sized' >>>>>> >> >>>>>>> versions >>>>>> >> >>>>>>> above, like working with evil unsized c-style arrays >>>>>> >> >>>>>>> directly >>>>>> >> >>>>>>> inline at the >>>>>> >> >>>>>>> end of the structure cease to be possible, so it isn't even >>>>>> >> >>>>>>> a pure >>>>>> >> >>>>>>> win if we >>>>>> >> >>>>>>> did the engineering effort. >>>>>> >> >>>>>>> >>>>>> >> >>>>>>> I think 90% of the needs I have are covered just by adding >>>>>> >> >>>>>>> the one >>>>>> >> >>>>>>> primitive. The last 10% gets pretty invasive. >>>>>> >> >>>>>>> >>>>>> >> >>>>>>> -Edward >>>>>> >> >>>>>>> >>>>>> >> >>>>>>> On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton >>>>>> >> >>>>>>> > >>>>>> >> >>>>>>> wrote: >>>>>> >> >>>>>>>> >>>>>> >> >>>>>>>> I like the possibility of a general solution for mutable >>>>>> >> >>>>>>>> structs >>>>>> >> >>>>>>>> (like Ed said), and I'm trying to fully understand why >>>>>> >> >>>>>>>> it's hard. >>>>>> >> >>>>>>>> >>>>>> >> >>>>>>>> So, we can't unpack MutVar into constructors because of >>>>>> >> >>>>>>>> object >>>>>> >> >>>>>>>> identity problems. But what about directly supporting an >>>>>> >> >>>>>>>> extensible set of >>>>>> >> >>>>>>>> unlifted MutStruct# objects, generalizing (and even >>>>>> >> >>>>>>>> replacing) >>>>>> >> >>>>>>>> MutVar#? That >>>>>> >> >>>>>>>> may be too much work, but is it problematic otherwise? >>>>>> >> >>>>>>>> >>>>>> >> >>>>>>>> Needless to say, this is also critical if we ever want >>>>>> >> >>>>>>>> best in >>>>>> >> >>>>>>>> class >>>>>> >> >>>>>>>> lockfree mutable structures, just like their Stm and >>>>>> >> >>>>>>>> sequential >>>>>> >> >>>>>>>> counterparts. >>>>>> >> >>>>>>>> >>>>>> >> >>>>>>>> On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones >>>>>> >> >>>>>>>> > wrote: >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> At the very least I'll take this email and turn it into a >>>>>> >> >>>>>>>>> short >>>>>> >> >>>>>>>>> article. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> Yes, please do make it into a wiki page on the GHC Trac, >>>>>> >> >>>>>>>>> and >>>>>> >> >>>>>>>>> maybe >>>>>> >> >>>>>>>>> make a ticket for it. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> Thanks >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> Simon >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> From: Edward Kmett [mailto:ekmett at gmail.com] >>>>>> >> >>>>>>>>> Sent: 27 August 2015 16:54 >>>>>> >> >>>>>>>>> To: Simon Peyton Jones >>>>>> >> >>>>>>>>> Cc: Manuel M T Chakravarty; Simon Marlow; ghc-devs >>>>>> >> >>>>>>>>> Subject: Re: ArrayArrays >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> An ArrayArray# is just an Array# with a modified >>>>>> >> >>>>>>>>> invariant. It >>>>>> >> >>>>>>>>> points directly to other unlifted ArrayArray#'s or >>>>>> >> >>>>>>>>> ByteArray#'s. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> While those live in #, they are garbage collected >>>>>> >> >>>>>>>>> objects, so >>>>>> >> >>>>>>>>> this >>>>>> >> >>>>>>>>> all lives on the heap. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> They were added to make some of the DPH stuff fast when >>>>>> >> >>>>>>>>> it has >>>>>> >> >>>>>>>>> to >>>>>> >> >>>>>>>>> deal with nested arrays. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> I'm currently abusing them as a placeholder for a better >>>>>> >> >>>>>>>>> thing. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> The Problem >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> ----------------- >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> Consider the scenario where you write a classic >>>>>> >> >>>>>>>>> doubly-linked >>>>>> >> >>>>>>>>> list >>>>>> >> >>>>>>>>> in Haskell. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> Chasing from one DLL to the next requires following 3 >>>>>> >> >>>>>>>>> pointers >>>>>> >> >>>>>>>>> on >>>>>> >> >>>>>>>>> the heap. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> DLL ~> IORef (Maybe DLL) ~> MutVar# RealWorld (Maybe DLL) >>>>>> >> >>>>>>>>> ~> >>>>>> >> >>>>>>>>> Maybe >>>>>> >> >>>>>>>>> DLL ~> DLL >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> That is 3 levels of indirection. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> We can trim one by simply unpacking the IORef with >>>>>> >> >>>>>>>>> -funbox-strict-fields or UNPACK >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> We can trim another by adding a 'Nil' constructor for DLL >>>>>> >> >>>>>>>>> and >>>>>> >> >>>>>>>>> worsening our representation. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> but now we're still stuck with a level of indirection >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> DLL ~> MutVar# RealWorld DLL ~> DLL >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> This means that every operation we perform on this >>>>>> >> >>>>>>>>> structure >>>>>> >> >>>>>>>>> will >>>>>> >> >>>>>>>>> be about half of the speed of an implementation in most >>>>>> >> >>>>>>>>> other >>>>>> >> >>>>>>>>> languages >>>>>> >> >>>>>>>>> assuming we're memory bound on loading things into cache! >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> Making Progress >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> ---------------------- >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> I have been working on a number of data structures where >>>>>> >> >>>>>>>>> the >>>>>> >> >>>>>>>>> indirection of going from something in * out to an object >>>>>> >> >>>>>>>>> in # >>>>>> >> >>>>>>>>> which >>>>>> >> >>>>>>>>> contains the real pointer to my target and coming back >>>>>> >> >>>>>>>>> effectively doubles >>>>>> >> >>>>>>>>> my runtime. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> We go out to the MutVar# because we are allowed to put >>>>>> >> >>>>>>>>> the >>>>>> >> >>>>>>>>> MutVar# >>>>>> >> >>>>>>>>> onto the mutable list when we dirty it. There is a well >>>>>> >> >>>>>>>>> defined >>>>>> >> >>>>>>>>> write-barrier. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> I could change out the representation to use >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> data DLL = DLL (MutableArray# RealWorld DLL) | Nil >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> I can just store two pointers in the MutableArray# every >>>>>> >> >>>>>>>>> time, >>>>>> >> >>>>>>>>> but >>>>>> >> >>>>>>>>> this doesn't help _much_ directly. It has reduced the >>>>>> >> >>>>>>>>> amount of >>>>>> >> >>>>>>>>> distinct >>>>>> >> >>>>>>>>> addresses in memory I touch on a walk of the DLL from 3 >>>>>> >> >>>>>>>>> per >>>>>> >> >>>>>>>>> object to 2. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> I still have to go out to the heap from my DLL and get to >>>>>> >> >>>>>>>>> the >>>>>> >> >>>>>>>>> array >>>>>> >> >>>>>>>>> object and then chase it to the next DLL and chase that >>>>>> >> >>>>>>>>> to the >>>>>> >> >>>>>>>>> next array. I >>>>>> >> >>>>>>>>> do get my two pointers together in memory though. I'm >>>>>> >> >>>>>>>>> paying for >>>>>> >> >>>>>>>>> a card >>>>>> >> >>>>>>>>> marking table as well, which I don't particularly need >>>>>> >> >>>>>>>>> with just >>>>>> >> >>>>>>>>> two >>>>>> >> >>>>>>>>> pointers, but we can shed that with the >>>>>> >> >>>>>>>>> "SmallMutableArray#" >>>>>> >> >>>>>>>>> machinery added >>>>>> >> >>>>>>>>> back in 7.10, which is just the old array code a a new >>>>>> >> >>>>>>>>> data >>>>>> >> >>>>>>>>> type, which can >>>>>> >> >>>>>>>>> speed things up a bit when you don't have very big >>>>>> >> >>>>>>>>> arrays: >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> But what if I wanted my object itself to live in # and >>>>>> >> >>>>>>>>> have two >>>>>> >> >>>>>>>>> mutable fields and be able to share the sme write >>>>>> >> >>>>>>>>> barrier? >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> An ArrayArray# points directly to other unlifted array >>>>>> >> >>>>>>>>> types. >>>>>> >> >>>>>>>>> What >>>>>> >> >>>>>>>>> if we have one # -> * wrapper on the outside to deal with >>>>>> >> >>>>>>>>> the >>>>>> >> >>>>>>>>> impedence >>>>>> >> >>>>>>>>> mismatch between the imperative world and Haskell, and >>>>>> >> >>>>>>>>> then just >>>>>> >> >>>>>>>>> let the >>>>>> >> >>>>>>>>> ArrayArray#'s hold other arrayarrays. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> data DLL = DLL (MutableArrayArray# RealWorld) >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> now I need to make up a new Nil, which I can just make be >>>>>> >> >>>>>>>>> a >>>>>> >> >>>>>>>>> special >>>>>> >> >>>>>>>>> MutableArrayArray# I allocate on program startup. I can >>>>>> >> >>>>>>>>> even >>>>>> >> >>>>>>>>> abuse pattern >>>>>> >> >>>>>>>>> synonyms. Alternately I can exploit the internals further >>>>>> >> >>>>>>>>> to >>>>>> >> >>>>>>>>> make this >>>>>> >> >>>>>>>>> cheaper. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> Then I can use the readMutableArrayArray# and >>>>>> >> >>>>>>>>> writeMutableArrayArray# calls to directly access the >>>>>> >> >>>>>>>>> preceding >>>>>> >> >>>>>>>>> and next >>>>>> >> >>>>>>>>> entry in the linked list. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> So now we have one DLL wrapper which just 'bootstraps me' >>>>>> >> >>>>>>>>> into a >>>>>> >> >>>>>>>>> strict world, and everything there lives in #. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> next :: DLL -> IO DLL >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> next (DLL m) = IO $ \s -> case readMutableArrayArray# s >>>>>> >> >>>>>>>>> of >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> (# s', n #) -> (# s', DLL n #) >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> It turns out GHC is quite happy to optimize all of that >>>>>> >> >>>>>>>>> code to >>>>>> >> >>>>>>>>> keep things unboxed. The 'DLL' wrappers get removed >>>>>> >> >>>>>>>>> pretty >>>>>> >> >>>>>>>>> easily when they >>>>>> >> >>>>>>>>> are known strict and you chain operations of this sort! >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> Cleaning it Up >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> ------------------ >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> Now I have one outermost indirection pointing to an array >>>>>> >> >>>>>>>>> that >>>>>> >> >>>>>>>>> points directly to other arrays. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> I'm stuck paying for a card marking table per object, but >>>>>> >> >>>>>>>>> I can >>>>>> >> >>>>>>>>> fix >>>>>> >> >>>>>>>>> that by duplicating the code for MutableArrayArray# and >>>>>> >> >>>>>>>>> using a >>>>>> >> >>>>>>>>> SmallMutableArray#. I can hack up primops that let me >>>>>> >> >>>>>>>>> store a >>>>>> >> >>>>>>>>> mixture of >>>>>> >> >>>>>>>>> SmallMutableArray# fields and normal ones in the data >>>>>> >> >>>>>>>>> structure. >>>>>> >> >>>>>>>>> Operationally, I can even do so by just unsafeCoercing >>>>>> >> >>>>>>>>> the >>>>>> >> >>>>>>>>> existing >>>>>> >> >>>>>>>>> SmallMutableArray# primitives to change the kind of one >>>>>> >> >>>>>>>>> of the >>>>>> >> >>>>>>>>> arguments it >>>>>> >> >>>>>>>>> takes. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> This is almost ideal, but not quite. I often have fields >>>>>> >> >>>>>>>>> that >>>>>> >> >>>>>>>>> would >>>>>> >> >>>>>>>>> be best left unboxed. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> data DLLInt = DLL !Int !(IORef DLL) !(IORef DLL) | Nil >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> was able to unpack the Int, but we lost that. We can >>>>>> >> >>>>>>>>> currently >>>>>> >> >>>>>>>>> at >>>>>> >> >>>>>>>>> best point one of the entries of the SmallMutableArray# >>>>>> >> >>>>>>>>> at a >>>>>> >> >>>>>>>>> boxed or at a >>>>>> >> >>>>>>>>> MutableByteArray# for all of our misc. data and shove the >>>>>> >> >>>>>>>>> int in >>>>>> >> >>>>>>>>> question in >>>>>> >> >>>>>>>>> there. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> e.g. if I were to implement a hash-array-mapped-trie I >>>>>> >> >>>>>>>>> need to >>>>>> >> >>>>>>>>> store masks and administrivia as I walk down the tree. >>>>>> >> >>>>>>>>> Having to >>>>>> >> >>>>>>>>> go off to >>>>>> >> >>>>>>>>> the side costs me the entire win from avoiding the first >>>>>> >> >>>>>>>>> pointer >>>>>> >> >>>>>>>>> chase. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> But, if like Ryan suggested, we had a heap object we >>>>>> >> >>>>>>>>> could >>>>>> >> >>>>>>>>> construct that had n words with unsafe access and m >>>>>> >> >>>>>>>>> pointers to >>>>>> >> >>>>>>>>> other heap >>>>>> >> >>>>>>>>> objects, one that could put itself on the mutable list >>>>>> >> >>>>>>>>> when any >>>>>> >> >>>>>>>>> of those >>>>>> >> >>>>>>>>> pointers changed then I could shed this last factor of >>>>>> >> >>>>>>>>> two in >>>>>> >> >>>>>>>>> all >>>>>> >> >>>>>>>>> circumstances. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> Prototype >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> ------------- >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> Over the last few days I've put together a small >>>>>> >> >>>>>>>>> prototype >>>>>> >> >>>>>>>>> implementation with a few non-trivial imperative data >>>>>> >> >>>>>>>>> structures >>>>>> >> >>>>>>>>> for things >>>>>> >> >>>>>>>>> like Tarjan's link-cut trees, the list labeling problem >>>>>> >> >>>>>>>>> and >>>>>> >> >>>>>>>>> order-maintenance. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> https://github.com/ekmett/structs >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> Notable bits: >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> Data.Struct.Internal.LinkCut provides an implementation >>>>>> >> >>>>>>>>> of >>>>>> >> >>>>>>>>> link-cut >>>>>> >> >>>>>>>>> trees in this style. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> Data.Struct.Internal provides the rather horrifying guts >>>>>> >> >>>>>>>>> that >>>>>> >> >>>>>>>>> make >>>>>> >> >>>>>>>>> it go fast. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> Once compiled with -O or -O2, if you look at the core, >>>>>> >> >>>>>>>>> almost >>>>>> >> >>>>>>>>> all >>>>>> >> >>>>>>>>> the references to the LinkCut or Object data constructor >>>>>> >> >>>>>>>>> get >>>>>> >> >>>>>>>>> optimized away, >>>>>> >> >>>>>>>>> and we're left with beautiful strict code directly >>>>>> >> >>>>>>>>> mutating out >>>>>> >> >>>>>>>>> underlying >>>>>> >> >>>>>>>>> representation. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> At the very least I'll take this email and turn it into a >>>>>> >> >>>>>>>>> short >>>>>> >> >>>>>>>>> article. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> -Edward >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> On Thu, Aug 27, 2015 at 9:00 AM, Simon Peyton Jones >>>>>> >> >>>>>>>>> > wrote: >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> Just to say that I have no idea what is going on in this >>>>>> >> >>>>>>>>> thread. >>>>>> >> >>>>>>>>> What is ArrayArray? What is the issue in general? Is >>>>>> >> >>>>>>>>> there a >>>>>> >> >>>>>>>>> ticket? Is >>>>>> >> >>>>>>>>> there a wiki page? >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> If it?s important, an ab-initio wiki page + ticket would >>>>>> >> >>>>>>>>> be a >>>>>> >> >>>>>>>>> good >>>>>> >> >>>>>>>>> thing. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> Simon >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On >>>>>> >> >>>>>>>>> Behalf >>>>>> >> >>>>>>>>> Of >>>>>> >> >>>>>>>>> Edward Kmett >>>>>> >> >>>>>>>>> Sent: 21 August 2015 05:25 >>>>>> >> >>>>>>>>> To: Manuel M T Chakravarty >>>>>> >> >>>>>>>>> Cc: Simon Marlow; ghc-devs >>>>>> >> >>>>>>>>> Subject: Re: ArrayArrays >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> When (ab)using them for this purpose, SmallArrayArray's >>>>>> >> >>>>>>>>> would be >>>>>> >> >>>>>>>>> very handy as well. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> Consider right now if I have something like an >>>>>> >> >>>>>>>>> order-maintenance >>>>>> >> >>>>>>>>> structure I have: >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> data Upper s = Upper {-# UNPACK #-} !(MutableByteArray s) >>>>>> >> >>>>>>>>> {-# >>>>>> >> >>>>>>>>> UNPACK #-} !(MutVar s (Upper s)) {-# UNPACK #-} !(MutVar >>>>>> >> >>>>>>>>> s >>>>>> >> >>>>>>>>> (Upper s)) >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> data Lower s = Lower {-# UNPACK #-} !(MutVar s (Upper s)) >>>>>> >> >>>>>>>>> {-# >>>>>> >> >>>>>>>>> UNPACK #-} !(MutableByteArray s) {-# UNPACK #-} !(MutVar >>>>>> >> >>>>>>>>> s >>>>>> >> >>>>>>>>> (Lower s)) {-# >>>>>> >> >>>>>>>>> UNPACK #-} !(MutVar s (Lower s)) >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> The former contains, logically, a mutable integer and two >>>>>> >> >>>>>>>>> pointers, >>>>>> >> >>>>>>>>> one for forward and one for backwards. The latter is >>>>>> >> >>>>>>>>> basically >>>>>> >> >>>>>>>>> the same >>>>>> >> >>>>>>>>> thing with a mutable reference up pointing at the >>>>>> >> >>>>>>>>> structure >>>>>> >> >>>>>>>>> above. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> On the heap this is an object that points to a structure >>>>>> >> >>>>>>>>> for the >>>>>> >> >>>>>>>>> bytearray, and points to another structure for each >>>>>> >> >>>>>>>>> mutvar which >>>>>> >> >>>>>>>>> each point >>>>>> >> >>>>>>>>> to the other 'Upper' structure. So there is a level of >>>>>> >> >>>>>>>>> indirection smeared >>>>>> >> >>>>>>>>> over everything. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> So this is a pair of doubly linked lists with an upward >>>>>> >> >>>>>>>>> link >>>>>> >> >>>>>>>>> from >>>>>> >> >>>>>>>>> the structure below to the structure above. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> Converted into ArrayArray#s I'd get >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> data Upper s = Upper (MutableArrayArray# s) >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> w/ the first slot being a pointer to a MutableByteArray#, >>>>>> >> >>>>>>>>> and >>>>>> >> >>>>>>>>> the >>>>>> >> >>>>>>>>> next 2 slots pointing to the previous and next previous >>>>>> >> >>>>>>>>> objects, >>>>>> >> >>>>>>>>> represented >>>>>> >> >>>>>>>>> just as their MutableArrayArray#s. I can use >>>>>> >> >>>>>>>>> sameMutableArrayArray# on these >>>>>> >> >>>>>>>>> for object identity, which lets me check for the ends of >>>>>> >> >>>>>>>>> the >>>>>> >> >>>>>>>>> lists by tying >>>>>> >> >>>>>>>>> things back on themselves. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> and below that >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> data Lower s = Lower (MutableArrayArray# s) >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> is similar, with an extra MutableArrayArray slot pointing >>>>>> >> >>>>>>>>> up to >>>>>> >> >>>>>>>>> an >>>>>> >> >>>>>>>>> upper structure. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> I can then write a handful of combinators for getting out >>>>>> >> >>>>>>>>> the >>>>>> >> >>>>>>>>> slots >>>>>> >> >>>>>>>>> in question, while it has gained a level of indirection >>>>>> >> >>>>>>>>> between >>>>>> >> >>>>>>>>> the wrapper >>>>>> >> >>>>>>>>> to put it in * and the MutableArrayArray# s in #, that >>>>>> >> >>>>>>>>> one can >>>>>> >> >>>>>>>>> be basically >>>>>> >> >>>>>>>>> erased by ghc. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> Unlike before I don't have several separate objects on >>>>>> >> >>>>>>>>> the heap >>>>>> >> >>>>>>>>> for >>>>>> >> >>>>>>>>> each thing. I only have 2 now. The MutableArrayArray# for >>>>>> >> >>>>>>>>> the >>>>>> >> >>>>>>>>> object itself, >>>>>> >> >>>>>>>>> and the MutableByteArray# that it references to carry >>>>>> >> >>>>>>>>> around the >>>>>> >> >>>>>>>>> mutable >>>>>> >> >>>>>>>>> int. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> The only pain points are >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> 1.) the aforementioned limitation that currently prevents >>>>>> >> >>>>>>>>> me >>>>>> >> >>>>>>>>> from >>>>>> >> >>>>>>>>> stuffing normal boxed data through a SmallArray or Array >>>>>> >> >>>>>>>>> into an >>>>>> >> >>>>>>>>> ArrayArray >>>>>> >> >>>>>>>>> leaving me in a little ghetto disconnected from the rest >>>>>> >> >>>>>>>>> of >>>>>> >> >>>>>>>>> Haskell, >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> and >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> 2.) the lack of SmallArrayArray's, which could let us >>>>>> >> >>>>>>>>> avoid the >>>>>> >> >>>>>>>>> card marking overhead. These objects are all small, 3-4 >>>>>> >> >>>>>>>>> pointers >>>>>> >> >>>>>>>>> wide. Card >>>>>> >> >>>>>>>>> marking doesn't help. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> Alternately I could just try to do really evil things and >>>>>> >> >>>>>>>>> convert >>>>>> >> >>>>>>>>> the whole mess to SmallArrays and then figure out how to >>>>>> >> >>>>>>>>> unsafeCoerce my way >>>>>> >> >>>>>>>>> to glory, stuffing the #'d references to the other arrays >>>>>> >> >>>>>>>>> directly into the >>>>>> >> >>>>>>>>> SmallArray as slots, removing the limitation we see here >>>>>> >> >>>>>>>>> by >>>>>> >> >>>>>>>>> aping the >>>>>> >> >>>>>>>>> MutableArrayArray# s API, but that gets really really >>>>>> >> >>>>>>>>> dangerous! >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> I'm pretty much willing to sacrifice almost anything on >>>>>> >> >>>>>>>>> the >>>>>> >> >>>>>>>>> altar >>>>>> >> >>>>>>>>> of speed here, but I'd like to be able to let the GC move >>>>>> >> >>>>>>>>> them >>>>>> >> >>>>>>>>> and collect >>>>>> >> >>>>>>>>> them which rules out simpler Ptr and Addr based >>>>>> >> >>>>>>>>> solutions. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> -Edward >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> On Thu, Aug 20, 2015 at 9:01 PM, Manuel M T Chakravarty >>>>>> >> >>>>>>>>> > wrote: >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> That?s an interesting idea. >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> Manuel >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> > Edward Kmett >: >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > Would it be possible to add unsafe primops to add >>>>>> >> >>>>>>>>> > Array# and >>>>>> >> >>>>>>>>> > SmallArray# entries to an ArrayArray#? The fact that >>>>>> >> >>>>>>>>> > the >>>>>> >> >>>>>>>>> > ArrayArray# entries >>>>>> >> >>>>>>>>> > are all directly unlifted avoiding a level of >>>>>> >> >>>>>>>>> > indirection for >>>>>> >> >>>>>>>>> > the containing >>>>>> >> >>>>>>>>> > structure is amazing, but I can only currently use it >>>>>> >> >>>>>>>>> > if my >>>>>> >> >>>>>>>>> > leaf level data >>>>>> >> >>>>>>>>> > can be 100% unboxed and distributed among ByteArray#s. >>>>>> >> >>>>>>>>> > It'd be >>>>>> >> >>>>>>>>> > nice to be >>>>>> >> >>>>>>>>> > able to have the ability to put SmallArray# a stuff >>>>>> >> >>>>>>>>> > down at >>>>>> >> >>>>>>>>> > the leaves to >>>>>> >> >>>>>>>>> > hold lifted contents. >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > I accept fully that if I name the wrong type when I go >>>>>> >> >>>>>>>>> > to >>>>>> >> >>>>>>>>> > access >>>>>> >> >>>>>>>>> > one of the fields it'll lie to me, but I suppose it'd >>>>>> >> >>>>>>>>> > do that >>>>>> >> >>>>>>>>> > if i tried to >>>>>> >> >>>>>>>>> > use one of the members that held a nested ArrayArray# >>>>>> >> >>>>>>>>> > as a >>>>>> >> >>>>>>>>> > ByteArray# >>>>>> >> >>>>>>>>> > anyways, so it isn't like there is a safety story >>>>>> >> >>>>>>>>> > preventing >>>>>> >> >>>>>>>>> > this. >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > I've been hunting for ways to try to kill the >>>>>> >> >>>>>>>>> > indirection >>>>>> >> >>>>>>>>> > problems I get with Haskell and mutable structures, and >>>>>> >> >>>>>>>>> > I >>>>>> >> >>>>>>>>> > could shoehorn a >>>>>> >> >>>>>>>>> > number of them into ArrayArrays if this worked. >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > Right now I'm stuck paying for 2 or 3 levels of >>>>>> >> >>>>>>>>> > unnecessary >>>>>> >> >>>>>>>>> > indirection compared to c/java and this could reduce >>>>>> >> >>>>>>>>> > that pain >>>>>> >> >>>>>>>>> > to just 1 >>>>>> >> >>>>>>>>> > level of unnecessary indirection. >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > -Edward >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> > _______________________________________________ >>>>>> >> >>>>>>>>> > ghc-devs mailing list >>>>>> >> >>>>>>>>> > ghc-devs at haskell.org >>>>>> >> >>>>>>>>> > >>>>>> >> >>>>>>>>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> >>>>>> >> >>>>>>>>> _______________________________________________ >>>>>> >> >>>>>>>>> ghc-devs mailing list >>>>>> >> >>>>>>>>> ghc-devs at haskell.org >>>>>> >> >>>>>>>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >>>>>> >> >>>>>>> >>>>>> >> >>>>>>> >>>>>> >> >>>>> >>>>>> >> >>> >>>>>> >> >> >>>>>> >> > >>>>>> >> > >>>>>> >> > _______________________________________________ >>>>>> >> > ghc-devs mailing list >>>>>> >> > ghc-devs at haskell.org >>>>>> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >>>>>> >> > >>>>>> > >>>>>> > >>>>> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> ghc-devs mailing list >>>> ghc-devs at haskell.org >>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >>>> >>> >> > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From voldermort at hotmail.com Tue Sep 1 11:57:17 2015 From: voldermort at hotmail.com (Harry .) Date: Tue, 1 Sep 2015 11:57:17 +0000 Subject: Planning for the 7.12 release In-Reply-To: References: Message-ID: Proposal: Make Semigroup as a superclass of Monoid https://mail.haskell.org/pipermail/libraries/2015-April/025590.html From hvriedel at gmail.com Tue Sep 1 12:06:56 2015 From: hvriedel at gmail.com (Herbert Valerio Riedel) Date: Tue, 01 Sep 2015 14:06:56 +0200 Subject: Planning for the 7.12 release In-Reply-To: (Harry .'s message of "Tue, 1 Sep 2015 11:57:17 +0000") References: Message-ID: <87613u1gyn.fsf@gmail.com> On 2015-09-01 at 13:57:17 +0200, Harry . wrote: > Proposal: Make Semigroup as a superclass of Monoid > https://mail.haskell.org/pipermail/libraries/2015-April/025590.html The plan is to (at the very least) move Data.Semigroups and Data.List.NonEmpty to base for GHC 7.12 If we have enough time we will also implement compile-warnings in GHC 7.12 to prepare for the next phases, if not they'll follow with the next major release after GHC 7.12 (effectively extending/delaying the migration-plan[1] by one year) [1]: https://mail.haskell.org/pipermail/libraries/2015-March/025413.html From johan.tibell at gmail.com Tue Sep 1 17:23:35 2015 From: johan.tibell at gmail.com (Johan Tibell) Date: Tue, 1 Sep 2015 10:23:35 -0700 Subject: RFC: Unpacking sum types Message-ID: I have a draft design for unpacking sum types that I'd like some feedback on. In particular feedback both on: * the writing and clarity of the proposal and * the proposal itself. https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes -- Johan -------------- next part -------------- An HTML attachment was scrubbed... URL: From dan.doel at gmail.com Tue Sep 1 18:31:14 2015 From: dan.doel at gmail.com (Dan Doel) Date: Tue, 1 Sep 2015 14:31:14 -0400 Subject: RFC: Unpacking sum types In-Reply-To: References: Message-ID: I wonder: are there issues with strict/unpacked fields in the sum type, with regard to the 'fill in stuff' behavior? For example: data C = C1 !Int | C2 ![Int] data D = D1 !Double {-# UNPACK #-} !C Naively we might think: data D' = D1 !Double !Tag !Int ![Int] But this is obviously not going to work at the Haskell-implemented-level. Since we're at a lower level, we could just not seq the things from the opposite constructor, but are there problems that arise from that? Also of course the !Int will probably also be unpacked, so such prim types need different handling (fill with 0, I guess). -- Also, I guess this is orthogonal, but having primitive, unboxed sums (analogous to unboxed tuples) would be nice as well. Conceivably they could be used as part of the specification of unpacked sums, since we can apparently put unboxed tuples in data types now. I'm not certain if they would cover all cases, though (like the strictness concerns above). -- Dan On Tue, Sep 1, 2015 at 1:23 PM, Johan Tibell wrote: > I have a draft design for unpacking sum types that I'd like some feedback > on. In particular feedback both on: > > * the writing and clarity of the proposal and > * the proposal itself. > > https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes > > -- Johan > > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > From thomasmiedema at gmail.com Tue Sep 1 18:34:08 2015 From: thomasmiedema at gmail.com (Thomas Miedema) Date: Tue, 1 Sep 2015 20:34:08 +0200 Subject: Proposal: accept pull requests on GitHub Message-ID: Hello all, my arguments against Phabricator are here: https://ghc.haskell.org/trac/ghc/wiki/WhyNotPhabricator. Some quotes from #ghc to pique your curiosity (there are some 50 more): * "is arc broken today?" * "arc is a frickin' mystery." * "i have a theory that i've managed to create a revision that phab can't handle." * "Diffs just seem to be too expensive to create ... I can't blame contributors for not wanting to do this for every atomic change" * "but seriously, we can't require this for contributing to GHC... the entry barrier is already high enough" GitHub has side-by-side diffs nowadays, and Travis-CI can run `./validate --fast` comfortably . *Proposal: accept pull requests from contributors on https://github.com/ghc/ghc .* Details: * use Travis-CI to validate pull requests. * keep using the Trac issue tracker (contributors are encouraged to put a link to their pull-request in the 'Differential Revisions' field). * keep using the Trac wiki. * in discussions on GitHub, use https://ghc.haskell.org/ticket/1234 to refer to Trac ticket 1234. The shortcut #1234 only works on Trac itself. * keep pushing to git.haskell.org, where the existing Git receive hooks can do their job keeping tabs, trailing whitespace and dangling submodule references out, notify Trac and send emails. Committers close pull-requests manually, just like they do Trac tickets. * keep running Phabricator for as long as necessary. * mention that pull requests are accepted on https://ghc.haskell.org/trac/ghc/wiki/WorkingConventions/FixingBugs. My expectation is that the majority of patches will start coming in via pull requests, the number of contributions will go up, commits will be smaller, and there will be more of them per pull request (contributors will be able to put style changes and refactorings into separate commits, without jumping through a bunch of hoops). Reviewers will get many more emails. Other arguments against GitHub are here: https://ghc.haskell.org/trac/ghc/wiki/WhyNotGitHub. I probably missed a few things, so fire away. Thanks, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at nh2.me Tue Sep 1 20:42:21 2015 From: mail at nh2.me (=?windows-1252?Q?Niklas_Hamb=FCchen?=) Date: Tue, 01 Sep 2015 22:42:21 +0200 Subject: Proposal: accept pull requests on GitHub In-Reply-To: References: Message-ID: <55E60DAD.503@nh2.me> Hi, I would recommend against moving code reviews to Github. I like it and use it all the time for my own projects, but for a large project like GHC, its code reviews are too basic (comments get lost in multi-round reviews), and its customisation an process enforcement is too weak; but that has all been mentioned already on the https://ghc.haskell.org/trac/ghc/wiki/WhyNotGitHub page you linked. I do however recommend accepting pull requests via Github. This is already the case for simple changes: In the past I asked Austin "can you pull this from my branch on Github called XXX", and it went in without problems and without me having to use arc locally. But this process could be more automated: For Ganeti (cluster manager made by Google, written largely in Haskell) I built a tool (https://github.com/google/pull-request-mailer) that listens for pull requests and sends them to the mailing list (Ganeti's preferred way of accepting patches and doing reviews). We built it because some people (me included) liked the Github workflow (push branch, click button) more than `git format-patch`+`git send-email`. You can see an example at https://github.com/ganeti/ganeti/pull/22. The tool then replies on Github that discussion of the change please be held on the mailing list. That has worked so far. It can also handle force-pushes when a PR gets updated based on feedback. Writing it and setting it up only took a few days. I think it wouldn't be too difficult to do the same for GHC: A small tool that imports Github PRs into Phabricator. I don't like the arc user experience. It's modeled in the same way as ReviewBoard, and just pushing a branch is easier in my opinion. However, Phabricator is quite good as a review tool. Its inability to review multiple commits is nasty, but I guess that'll be fixed at some point. If not, such an import tool I suggest could to the squashing for you. Unfortunately there is currently no open source review tool that can handle reviewing entire branches AND multiple revisions of such branches. It's possible to build them though, some companies have internal review tools that do it and they work extremely well. I believe that a simple automated import setup could address many of the points in https://ghc.haskell.org/trac/ghc/wiki/WhyNotPhabricator. Niklas On 01/09/15 20:34, Thomas Miedema wrote: > Hello all, > > my arguments against Phabricator are here: > https://ghc.haskell.org/trac/ghc/wiki/WhyNotPhabricator. > > Some quotes from #ghc to pique your curiosity (there are some 50 more): > * "is arc broken today?" > * "arc is a frickin' mystery." > * "i have a theory that i've managed to create a revision that phab > can't handle." > * "Diffs just seem to be too expensive to create ... I can't blame > contributors for not wanting to do this for every atomic change" > * "but seriously, we can't require this for contributing to GHC... the > entry barrier is already high enough" > > GitHub has side-by-side diffs > nowadays, and > Travis-CI can run `./validate --fast` comfortably > . > > *Proposal: accept pull requests from contributors on > https://github.com/ghc/ghc.* > > Details: > * use Travis-CI to validate pull requests. > * keep using the Trac issue tracker (contributors are encouraged to put > a link to their pull-request in the 'Differential Revisions' field). > * keep using the Trac wiki. > * in discussions on GitHub, use https://ghc.haskell.org/ticket/1234 to > refer to Trac ticket 1234. The shortcut #1234 only works on Trac itself. > * keep pushing to git.haskell.org , where the > existing Git receive hooks can do their job keeping tabs, trailing > whitespace and dangling submodule references out, notify Trac and send > emails. Committers close pull-requests manually, just like they do Trac > tickets. > * keep running Phabricator for as long as necessary. > * mention that pull requests are accepted on > https://ghc.haskell.org/trac/ghc/wiki/WorkingConventions/FixingBugs. > > My expectation is that the majority of patches will start coming in via > pull requests, the number of contributions will go up, commits will be > smaller, and there will be more of them per pull request (contributors > will be able to put style changes and refactorings into separate > commits, without jumping through a bunch of hoops). > > Reviewers will get many more emails. Other arguments against GitHub are > here: https://ghc.haskell.org/trac/ghc/wiki/WhyNotGitHub. > > I probably missed a few things, so fire away. > > Thanks, > Thomas > > > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > From johan.tibell at gmail.com Wed Sep 2 01:09:48 2015 From: johan.tibell at gmail.com (Johan Tibell) Date: Tue, 1 Sep 2015 18:09:48 -0700 Subject: RFC: Unpacking sum types In-Reply-To: References: Message-ID: After some discussions with SPJ I've now rewritten the proposal in terms of unboxed sums (which should suffer from the extra seq problem you mention above). On Tue, Sep 1, 2015 at 11:31 AM, Dan Doel wrote: > I wonder: are there issues with strict/unpacked fields in the sum > type, with regard to the 'fill in stuff' behavior? > > For example: > > data C = C1 !Int | C2 ![Int] > > data D = D1 !Double {-# UNPACK #-} !C > > Naively we might think: > > data D' = D1 !Double !Tag !Int ![Int] > > But this is obviously not going to work at the > Haskell-implemented-level. Since we're at a lower level, we could just > not seq the things from the opposite constructor, but are there > problems that arise from that? Also of course the !Int will probably > also be unpacked, so such prim types need different handling (fill > with 0, I guess). > > -- > > Also, I guess this is orthogonal, but having primitive, unboxed sums > (analogous to unboxed tuples) would be nice as well. Conceivably they > could be used as part of the specification of unpacked sums, since we > can apparently put unboxed tuples in data types now. I'm not certain > if they would cover all cases, though (like the strictness concerns > above). > > -- Dan > > > On Tue, Sep 1, 2015 at 1:23 PM, Johan Tibell > wrote: > > I have a draft design for unpacking sum types that I'd like some feedback > > on. In particular feedback both on: > > > > * the writing and clarity of the proposal and > > * the proposal itself. > > > > https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes > > > > -- Johan > > > > > > _______________________________________________ > > ghc-devs mailing list > > ghc-devs at haskell.org > > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rrnewton at gmail.com Wed Sep 2 01:44:03 2015 From: rrnewton at gmail.com (Ryan Newton) Date: Wed, 02 Sep 2015 01:44:03 +0000 Subject: RFC: Unpacking sum types In-Reply-To: References: Message-ID: Just a small comment about syntax. Why is there an "_n" suffix on the type constructor? Isn't it syntactically evident how many things are in the |# .. | .. #| block? More generally, are the parser changes and the wild new syntax strictly necessary? Could we instead just have a new keyword, but have at look like a normal type constructor? For example, the type: (Sum# T1 T2 T3) Where "UnboxedSum" can't be partially applied, and is variable arity. Likewise, "MkSum#" could be a keyword/syntactic-form: (MkSum# 1 3 expr) case x of MkSum# 1 3 v -> e Here "1" and "3" are part of the syntactic form, not expressions. But it can probably be handled after parsing and doesn't require the "_n_m" business. -Ryan On Tue, Sep 1, 2015 at 6:10 PM Johan Tibell wrote: > After some discussions with SPJ I've now rewritten the proposal in terms > of unboxed sums (which should suffer from the extra seq problem you mention > above). > > On Tue, Sep 1, 2015 at 11:31 AM, Dan Doel wrote: > >> I wonder: are there issues with strict/unpacked fields in the sum >> type, with regard to the 'fill in stuff' behavior? >> >> For example: >> >> data C = C1 !Int | C2 ![Int] >> >> data D = D1 !Double {-# UNPACK #-} !C >> >> Naively we might think: >> >> data D' = D1 !Double !Tag !Int ![Int] >> >> But this is obviously not going to work at the >> Haskell-implemented-level. Since we're at a lower level, we could just >> not seq the things from the opposite constructor, but are there >> problems that arise from that? Also of course the !Int will probably >> also be unpacked, so such prim types need different handling (fill >> with 0, I guess). >> >> -- >> >> Also, I guess this is orthogonal, but having primitive, unboxed sums >> (analogous to unboxed tuples) would be nice as well. Conceivably they >> could be used as part of the specification of unpacked sums, since we >> can apparently put unboxed tuples in data types now. I'm not certain >> if they would cover all cases, though (like the strictness concerns >> above). >> >> -- Dan >> >> >> On Tue, Sep 1, 2015 at 1:23 PM, Johan Tibell >> wrote: >> > I have a draft design for unpacking sum types that I'd like some >> feedback >> > on. In particular feedback both on: >> > >> > * the writing and clarity of the proposal and >> > * the proposal itself. >> > >> > https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes >> > >> > -- Johan >> > >> > >> > _______________________________________________ >> > ghc-devs mailing list >> > ghc-devs at haskell.org >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at joachim-breitner.de Wed Sep 2 02:12:27 2015 From: mail at joachim-breitner.de (Joachim Breitner) Date: Tue, 01 Sep 2015 19:12:27 -0700 Subject: RFC: Unpacking sum types In-Reply-To: References: Message-ID: <1441159947.3393.13.camel@joachim-breitner.de> Hi, Am Mittwoch, den 02.09.2015, 01:44 +0000 schrieb Ryan Newton: > Why is there an "_n" suffix on the type constructor? Isn't it > syntactically evident how many things are in the |# .. | .. #| > block? Correct. > More generally, are the parser changes and the wild new syntax > strictly necessary? If we just add it to Core, to support UNPACK, then there is no parser involved anyways, and the pretty-printer may do fancy stuff. (Why not unicode subscript numbers like ? :-)) But we probably want to provide this also on the Haskell level, just like unboxed products, right? Then we should have a nice syntax. Personally, I find (# a | b | c #) visually more pleasing. (The disadvantage is that this works only for two or more alternatives, but the one-alternative-unboxed-union is isomorphic to the one-element -unboxed-tuple anyways, isn?t it?) > Likewise, "MkSum#" could be a keyword/syntactic-form: > > (MkSum# 1 3 expr) > case x of MkSum# 1 3 v -> e > > Here "1" and "3" are part of the syntactic form, not expressions. > But it can probably be handled after parsing and doesn't require the > "_n_m" business. If we expose it on the Haskell level, I find MkSum_1_2# the right thing to do: It makes it clear that (conceptually) there really is a constructor of that name, and it is distinct from MkSum_2_2#, and the user cannot do computation with these indices. Greetings, Joachim -- Joachim ?nomeata? Breitner mail at joachim-breitner.de ? http://www.joachim-breitner.de/ Jabber: nomeata at joachim-breitner.de ? GPG-Key: 0xF0FBF51F Debian Developer: nomeata at debian.org -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From rrnewton at gmail.com Wed Sep 2 02:22:24 2015 From: rrnewton at gmail.com (Ryan Newton) Date: Wed, 02 Sep 2015 02:22:24 +0000 Subject: RFC: Unpacking sum types In-Reply-To: <1441159947.3393.13.camel@joachim-breitner.de> References: <1441159947.3393.13.camel@joachim-breitner.de> Message-ID: > > If we expose it on the Haskell level, I find MkSum_1_2# the right thing > to do: It makes it clear that (conceptually) there really is a > constructor of that name, and it is distinct from MkSum_2_2#, and the > user cannot do computation with these indices. > I don't mind MkSum_1_2#, it avoids the awkwardness of attaching it to a closing delimiter. But... it does still introduce the idea of cutting up tokens to get numbers out of them, which is kind of hacky. (There seems to be a conserved particle of hackiness here that can't be eliminate, but it doesn't bother me too much.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at joachim-breitner.de Wed Sep 2 05:58:53 2015 From: mail at joachim-breitner.de (Joachim Breitner) Date: Tue, 01 Sep 2015 22:58:53 -0700 Subject: RFC: Unpacking sum types In-Reply-To: References: Message-ID: Hi, just an idea that crossed my mind: Can we do without the worker/wrapper dance for data constructors if we instead phrase that in terms of pattern synonyms? Maybe that's a refactoring/code consolidation opportunity. Good night, Joachim Am 1. September 2015 10:23:35 PDT, schrieb Johan Tibell : >I have a draft design for unpacking sum types that I'd like some >feedback >on. In particular feedback both on: > > * the writing and clarity of the proposal and > * the proposal itself. > >https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes > >-- Johan > > >------------------------------------------------------------------------ > >_______________________________________________ >ghc-devs mailing list >ghc-devs at haskell.org >http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From michael at diglumi.com Wed Sep 2 07:47:25 2015 From: michael at diglumi.com (Michael Smith) Date: Wed, 2 Sep 2015 00:47:25 -0700 Subject: Shared data type for extension flags Message-ID: #10820 on Trac [1] and D1200 on Phabricator [2] discuss adding the capababilty to Template Haskell to detect which language extensions enabled. Unfortunately, since template-haskell can't depend on ghc (as ghc depends on template-haskell), it can't simply re-export the ExtensionFlag type from DynFlags to the user. There is a second data type encoding the list of possible language extensions in the Cabal package, in Language.Haskell.Extension [3]. But template-haskell doesn't already depend on Cabal, and doing so seems like it would cause difficulties, as the two packages can be upgraded separately. So adding this new feature to Template Haskell requires introducing a *third* data type for language extensions. It also requires enumerating this full list in two more places, to convert back and forth between the TH Extension data type and GHC's internal ExtensionFlag data type. Is there another way here? Can there be one single shared data type for this somehow? [1] https://ghc.haskell.org/trac/ghc/ticket/10820 [2] https://phabricator.haskell.org/D1200 [3] https://hackage.haskell.org/package/Cabal-1.22.4.0/docs/Language-Haskell-Extension.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthewtpickering at gmail.com Wed Sep 2 08:00:40 2015 From: matthewtpickering at gmail.com (Matthew Pickering) Date: Wed, 2 Sep 2015 10:00:40 +0200 Subject: Shared data type for extension flags In-Reply-To: References: Message-ID: Surely the easiest way here (including for other tooling - ie haskell-src-exts) is to create a package which just provides this enumeration. GHC, cabal, th, haskell-src-exts and so on then all depend on this package rather than creating their own enumeration. On Wed, Sep 2, 2015 at 9:47 AM, Michael Smith wrote: > #10820 on Trac [1] and D1200 on Phabricator [2] discuss adding the > capababilty > to Template Haskell to detect which language extensions enabled. > Unfortunately, > since template-haskell can't depend on ghc (as ghc depends on > template-haskell), > it can't simply re-export the ExtensionFlag type from DynFlags to the user. > > There is a second data type encoding the list of possible language > extensions in > the Cabal package, in Language.Haskell.Extension [3]. But template-haskell > doesn't already depend on Cabal, and doing so seems like it would cause > difficulties, as the two packages can be upgraded separately. > > So adding this new feature to Template Haskell requires introducing a > *third* > data type for language extensions. It also requires enumerating this full > list > in two more places, to convert back and forth between the TH Extension data > type > and GHC's internal ExtensionFlag data type. > > Is there another way here? Can there be one single shared data type for this > somehow? > > [1] https://ghc.haskell.org/trac/ghc/ticket/10820 > [2] https://phabricator.haskell.org/D1200 > [3] > https://hackage.haskell.org/package/Cabal-1.22.4.0/docs/Language-Haskell-Extension.html > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > From michael at diglumi.com Wed Sep 2 08:20:30 2015 From: michael at diglumi.com (Michael Smith) Date: Wed, 2 Sep 2015 01:20:30 -0700 Subject: Shared data type for extension flags In-Reply-To: References: Message-ID: That sounds like a good approach. Are there other things that would go nicely in a shared package like this, in addition to the extension data type? On Wed, Sep 2, 2015 at 1:00 AM, Matthew Pickering < matthewtpickering at gmail.com> wrote: > Surely the easiest way here (including for other tooling - ie > haskell-src-exts) is to create a package which just provides this > enumeration. GHC, cabal, th, haskell-src-exts and so on then all > depend on this package rather than creating their own enumeration. > > On Wed, Sep 2, 2015 at 9:47 AM, Michael Smith wrote: > > #10820 on Trac [1] and D1200 on Phabricator [2] discuss adding the > > capababilty > > to Template Haskell to detect which language extensions enabled. > > Unfortunately, > > since template-haskell can't depend on ghc (as ghc depends on > > template-haskell), > > it can't simply re-export the ExtensionFlag type from DynFlags to the > user. > > > > There is a second data type encoding the list of possible language > > extensions in > > the Cabal package, in Language.Haskell.Extension [3]. But > template-haskell > > doesn't already depend on Cabal, and doing so seems like it would cause > > difficulties, as the two packages can be upgraded separately. > > > > So adding this new feature to Template Haskell requires introducing a > > *third* > > data type for language extensions. It also requires enumerating this full > > list > > in two more places, to convert back and forth between the TH Extension > data > > type > > and GHC's internal ExtensionFlag data type. > > > > Is there another way here? Can there be one single shared data type for > this > > somehow? > > > > [1] https://ghc.haskell.org/trac/ghc/ticket/10820 > > [2] https://phabricator.haskell.org/D1200 > > [3] > > > https://hackage.haskell.org/package/Cabal-1.22.4.0/docs/Language-Haskell-Extension.html > > > > _______________________________________________ > > ghc-devs mailing list > > ghc-devs at haskell.org > > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben at well-typed.com Wed Sep 2 10:43:57 2015 From: ben at well-typed.com (Ben Gamari) Date: Wed, 02 Sep 2015 12:43:57 +0200 Subject: more releases In-Reply-To: References: <3E39E8B5-89C2-40F6-9180-C6D73AF3926F@cis.upenn.edu> <87si6y1v30.fsf@gmail.com> Message-ID: <87oahlksnm.fsf@smart-cactus.org> Richard Eisenberg writes: > On Sep 1, 2015, at 12:01 AM, Herbert Valerio Riedel wrote: > >> I'd say mostly organisational overhead which can't be fully automated >> (afaik, Ben has already automated large parts but not everything can be): >> >> - Coordinating with people creating and testing the bindists > > This was the sort of thing I thought could be automated. I'm picturing > a system where Austin/Ben hits a button and everything whirs to life, > creating, testing, and posting bindists, with no people involved. > I can nearly do this for Linux with my existing tools. I can do 32- and 64-bit builds for both RedHat and Debian all on a single Debian 8 machine with the tools I developed during the course of the 7.10.2 release [1]. Windows is unfortunately still a challenge. I did the 7.10.2 builds on an EC2 instance and the experience wasn't terribly fun. I would love for this to be further automated but I've not done this yet. >> - Writing releases notes & announcment > > Release notes should, theoretically, be updated with the patches. > Announcement can be automated. > If I'm doing my job well the release notes shouldn't be a problem. I've been trying to be meticulous about ensuring that all new features come with acceptable release notes. >> - If bundled core-libraries are affected, coordination overhead with package >> maintainers (unless GHC HQ owned), verifying version bumps (API diff!) and >> changelogs have been updated accordingly, uploading to Hackage > > Any library version change would require a more proper release. Do > these libraries tend to change during a major release cycle? > The core libraries are perhaps the trickiest part of this. Currently the process goes something like this, 1. We branch off a stable GHC release 2. Development continues on `master`, eventually a breaking change is merged to one of the libraries 3. Eventually someone notices and bumps the library's version 4. More breaking changes are merged to the library 5. We branch off for another stable release, right before the release we manually push the libraries to Hackage 6. Repeat from (2) There can potentially be a lot of interface churn between steps 3 and 5. If we did releases in this period we would need to be much more careful about library versioning. I suspect this may end up being quite a bit of work to do properly. Technically we could punt on this problem and just do the same sort of stable/unstable versioning for the libraries that we already do with GHC itself. This would mean, however, that we couldn't upload the libraries to Hackage. >> - Uploading and signing packagees to download.haskell.org, and verifying >> the downloads > > This isn't automated? > It is now (see [2]). This shouldn't be a problem. >> Austin & Ben probably have more to add to this list >> > I'm sure they do. > > Again, I'd be fine if the answer from the community is "it's just not > what we need". But I wanted to see if there were > technical/practical/social reasons why this was or wasn't a good idea. > If we do think it's a good idea absent those reasons, then we can work > on addressing those concerns. > Technically I think there are no reasons why this isn't feasible with some investment. Exactly how much investment depends upon what exactly we want to achieve, * How often do we make these releases? * Which platforms do we support? * How carefully do we version included libraries? If we focus solely on Linux and punt on the library versioning issue I would say this wouldn't even difficult. I could easily setup my build machine to do a nightly bindist and push it to a server somewhere. Austin has also mentioned that Harbormaster builds could potentially produce bindists. The question is whether users want more rapid releases. Those working on GHC will use their own builds. Most users want something reasonably stable (in both the interface sense and the reliability sense) and therefore I suspect would stick with the releases. This leaves a relatively small number of potential users; namely those who want to play around with unreleased features yet aren't willing to do their own builds. Cheers, - Ben [1] https://github.com/bgamari/ghc-utils [2] https://github.com/bgamari/ghc-utils/blob/master/rel-eng/upload.sh -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 472 bytes Desc: not available URL: From hvriedel at gmail.com Wed Sep 2 10:49:32 2015 From: hvriedel at gmail.com (Herbert Valerio Riedel) Date: Wed, 02 Sep 2015 12:49:32 +0200 Subject: more releases In-Reply-To: <87oahlksnm.fsf@smart-cactus.org> (Ben Gamari's message of "Wed, 02 Sep 2015 12:43:57 +0200") References: <3E39E8B5-89C2-40F6-9180-C6D73AF3926F@cis.upenn.edu> <87si6y1v30.fsf@gmail.com> <87oahlksnm.fsf@smart-cactus.org> Message-ID: <87fv2xcczn.fsf@gmail.com> On 2015-09-02 at 12:43:57 +0200, Ben Gamari wrote: [...] > The question is whether users want more rapid releases. Those working on > GHC will use their own builds. Most users want something reasonably > stable (in both the interface sense and the reliability sense) and > therefore I suspect would stick with the releases. This leaves a > relatively small number of potential users; namely those who want to > play around with unreleased features yet aren't willing to do their own > builds. Btw, for those who are willing to use Ubuntu there's already GHC HEAD builds available in my PPA, and I can easily keep creating GHC 7.10.3 snapshots in the same style like I usually do shortly before a stable point-release. From simonpj at microsoft.com Wed Sep 2 14:33:23 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Wed, 2 Sep 2015 14:33:23 +0000 Subject: Shared data type for extension flags In-Reply-To: References: Message-ID: we already have such a shared library, I think: bin-package-db. would that do? Simon From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Michael Smith Sent: 02 September 2015 09:21 To: Matthew Pickering Cc: GHC developers Subject: Re: Shared data type for extension flags That sounds like a good approach. Are there other things that would go nicely in a shared package like this, in addition to the extension data type? On Wed, Sep 2, 2015 at 1:00 AM, Matthew Pickering > wrote: Surely the easiest way here (including for other tooling - ie haskell-src-exts) is to create a package which just provides this enumeration. GHC, cabal, th, haskell-src-exts and so on then all depend on this package rather than creating their own enumeration. On Wed, Sep 2, 2015 at 9:47 AM, Michael Smith > wrote: > #10820 on Trac [1] and D1200 on Phabricator [2] discuss adding the > capababilty > to Template Haskell to detect which language extensions enabled. > Unfortunately, > since template-haskell can't depend on ghc (as ghc depends on > template-haskell), > it can't simply re-export the ExtensionFlag type from DynFlags to the user. > > There is a second data type encoding the list of possible language > extensions in > the Cabal package, in Language.Haskell.Extension [3]. But template-haskell > doesn't already depend on Cabal, and doing so seems like it would cause > difficulties, as the two packages can be upgraded separately. > > So adding this new feature to Template Haskell requires introducing a > *third* > data type for language extensions. It also requires enumerating this full > list > in two more places, to convert back and forth between the TH Extension data > type > and GHC's internal ExtensionFlag data type. > > Is there another way here? Can there be one single shared data type for this > somehow? > > [1] https://ghc.haskell.org/trac/ghc/ticket/10820 > [2] https://phabricator.haskell.org/D1200 > [3] > https://hackage.haskell.org/package/Cabal-1.22.4.0/docs/Language-Haskell-Extension.html > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From simonpj at microsoft.com Wed Sep 2 14:43:54 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Wed, 2 Sep 2015 14:43:54 +0000 Subject: [Haskell] ETA on 7.10.3? In-Reply-To: <6AEDE614-430C-4A68-9B2E-22B8FC0275FB@gmail.com> References: <87vbbtl2qz.fsf@smart-cactus.org> <6AEDE614-430C-4A68-9B2E-22B8FC0275FB@gmail.com> Message-ID: Ah, well https://github.com/ku-fpg/hermit/issues/144#issuecomment-128762767 links in turn to https://github.com/ku-fpg/hermit/issues/141, which is a long thread I can't follow. Ryan, Andy: if 7.10.2 is unusable for you, for some reason, please make a ticket to explain why, and ask for 7.10.3. Simon From: Haskell [mailto:haskell-bounces at haskell.org] On Behalf Of David Banas Sent: 02 September 2015 13:19 To: Ben Gamari Cc: haskell at haskell.org Subject: Re: [Haskell] ETA on 7.10.3? Hi Ben, Thanks for your reply. My problem is the project I'm currently working on is dependent upon HERMIT, which doesn't play well with 7.10.2, as per: https://github.com/ku-fpg/hermit/issues/144#issuecomment-128762767 (The nature of that comment caused me to think that 7.10.3 was in play.) Thanks, -db On Sep 2, 2015, at 12:05 AM, Ben Gamari > wrote: David Banas > writes: Hi, Does anyone have an ETA for ghc v7.10.3? (I'm trying to decide between waiting and backing up to 7.8.2, for a particular project.) Currently there are no plans to do a 7.10.3 release. 7.10.2 does has a few issues, but none of them are critical regressions but none of them appear critical enough to burn maintenance time on. Of course, we are willing to reevaluate in the event that new issues arise. What problems with 7.10.2 are you struggling with? Cheers, - Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From afarmer at ittc.ku.edu Wed Sep 2 15:40:15 2015 From: afarmer at ittc.ku.edu (Andrew Farmer) Date: Wed, 2 Sep 2015 08:40:15 -0700 Subject: [Haskell] ETA on 7.10.3? In-Reply-To: References: <87vbbtl2qz.fsf@smart-cactus.org> <6AEDE614-430C-4A68-9B2E-22B8FC0275FB@gmail.com> Message-ID: Sorry, I dropped the ball on creating a ticket. I just did so: https://ghc.haskell.org/trac/ghc/ticket/10829 (As an aside, the original ticket, #10528, had a milestone set as 7.10.3, so I just assumed a 7.10.3 was planned and coming soon.) On Wed, Sep 2, 2015 at 7:43 AM, Simon Peyton Jones wrote: > Ah, well https://github.com/ku-fpg/hermit/issues/144#issuecomment-128762767 > > links in turn to https://github.com/ku-fpg/hermit/issues/141, which is a > long thread I can?t follow. > > > > Ryan, Andy: if 7.10.2 is unusable for you, for some reason, please make a > ticket to explain why, and ask for 7.10.3. > > > Simon > > > > From: Haskell [mailto:haskell-bounces at haskell.org] On Behalf Of David Banas > Sent: 02 September 2015 13:19 > To: Ben Gamari > Cc: haskell at haskell.org > Subject: Re: [Haskell] ETA on 7.10.3? > > > > Hi Ben, > > > > Thanks for your reply. > > > > My problem is the project I?m currently working on is dependent upon HERMIT, > which doesn?t play well with 7.10.2, as per: > > > > https://github.com/ku-fpg/hermit/issues/144#issuecomment-128762767 > > > > (The nature of that comment caused me to think that 7.10.3 was in play.) > > > > Thanks, > > -db > > > > On Sep 2, 2015, at 12:05 AM, Ben Gamari wrote: > > > > David Banas writes: > > > Hi, > > Does anyone have an ETA for ghc v7.10.3? > (I'm trying to decide between waiting and backing up to 7.8.2, for a > particular project.) > > Currently there are no plans to do a 7.10.3 release. 7.10.2 does has a > few issues, but none of them are critical regressions but none of them > appear critical enough to burn maintenance time on. > > Of course, we are willing to reevaluate in the event that new issues > arise. What problems with 7.10.2 are you struggling with? > > Cheers, > > - Ben > > > > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > From greg at gregweber.info Wed Sep 2 15:43:02 2015 From: greg at gregweber.info (Greg Weber) Date: Wed, 2 Sep 2015 08:43:02 -0700 Subject: Proposal: accept pull requests on GitHub In-Reply-To: <55E60DAD.503@nh2.me> References: <55E60DAD.503@nh2.me> Message-ID: I like Niklas's suggestion of a middle-ground approach. There are benefits to using phabricator (and arc), but there should be a lowered-bar approach where people can start contributing through github (even though they may be forced to do the code review on phabricator). On Tue, Sep 1, 2015 at 1:42 PM, Niklas Hamb?chen wrote: > Hi, > > I would recommend against moving code reviews to Github. > I like it and use it all the time for my own projects, but for a large > project like GHC, its code reviews are too basic (comments get lost in > multi-round reviews), and its customisation an process enforcement is > too weak; but that has all been mentioned already on the > https://ghc.haskell.org/trac/ghc/wiki/WhyNotGitHub page you linked. > > I do however recommend accepting pull requests via Github. > > This is already the case for simple changes: In the past I asked Austin > "can you pull this from my branch on Github called XXX", and it went in > without problems and without me having to use arc locally. > > But this process could be more automated: > > For Ganeti (cluster manager made by Google, written largely in Haskell) > I built a tool (https://github.com/google/pull-request-mailer) that > listens for pull requests and sends them to the mailing list (Ganeti's > preferred way of accepting patches and doing reviews). We built it > because some people (me included) liked the Github workflow (push > branch, click button) more than `git format-patch`+`git send-email`. You > can see an example at https://github.com/ganeti/ganeti/pull/22. > The tool then replies on Github that discussion of the change please be > held on the mailing list. That has worked so far. > It can also handle force-pushes when a PR gets updated based on > feedback. Writing it and setting it up only took a few days. > > I think it wouldn't be too difficult to do the same for GHC: A small > tool that imports Github PRs into Phabricator. > > I don't like the arc user experience. It's modeled in the same way as > ReviewBoard, and just pushing a branch is easier in my opinion. > > However, Phabricator is quite good as a review tool. Its inability to > review multiple commits is nasty, but I guess that'll be fixed at some > point. If not, such an import tool I suggest could to the squashing for > you. > > Unfortunately there is currently no open source review tool that can > handle reviewing entire branches AND multiple revisions of such > branches. It's possible to build them though, some companies have > internal review tools that do it and they work extremely well. > > I believe that a simple automated import setup could address many of the > points in https://ghc.haskell.org/trac/ghc/wiki/WhyNotPhabricator. > > Niklas > > On 01/09/15 20:34, Thomas Miedema wrote: > > Hello all, > > > > my arguments against Phabricator are here: > > https://ghc.haskell.org/trac/ghc/wiki/WhyNotPhabricator. > > > > Some quotes from #ghc to pique your curiosity (there are some 50 more): > > * "is arc broken today?" > > * "arc is a frickin' mystery." > > * "i have a theory that i've managed to create a revision that phab > > can't handle." > > * "Diffs just seem to be too expensive to create ... I can't blame > > contributors for not wanting to do this for every atomic change" > > * "but seriously, we can't require this for contributing to GHC... the > > entry barrier is already high enough" > > > > GitHub has side-by-side diffs > > nowadays, and > > Travis-CI can run `./validate --fast` comfortably > > . > > > > *Proposal: accept pull requests from contributors on > > https://github.com/ghc/ghc.* > > > > Details: > > * use Travis-CI to validate pull requests. > > * keep using the Trac issue tracker (contributors are encouraged to put > > a link to their pull-request in the 'Differential Revisions' field). > > * keep using the Trac wiki. > > * in discussions on GitHub, use https://ghc.haskell.org/ticket/1234 to > > refer to Trac ticket 1234. The shortcut #1234 only works on Trac itself. > > * keep pushing to git.haskell.org , where the > > existing Git receive hooks can do their job keeping tabs, trailing > > whitespace and dangling submodule references out, notify Trac and send > > emails. Committers close pull-requests manually, just like they do Trac > > tickets. > > * keep running Phabricator for as long as necessary. > > * mention that pull requests are accepted on > > https://ghc.haskell.org/trac/ghc/wiki/WorkingConventions/FixingBugs. > > > > My expectation is that the majority of patches will start coming in via > > pull requests, the number of contributions will go up, commits will be > > smaller, and there will be more of them per pull request (contributors > > will be able to put style changes and refactorings into separate > > commits, without jumping through a bunch of hoops). > > > > Reviewers will get many more emails. Other arguments against GitHub are > > here: https://ghc.haskell.org/trac/ghc/wiki/WhyNotGitHub. > > > > I probably missed a few things, so fire away. > > > > Thanks, > > Thomas > > > > > > > > _______________________________________________ > > ghc-devs mailing list > > ghc-devs at haskell.org > > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eir at cis.upenn.edu Wed Sep 2 15:44:15 2015 From: eir at cis.upenn.edu (Richard Eisenberg) Date: Wed, 2 Sep 2015 08:44:15 -0700 Subject: more releases In-Reply-To: <87oahlksnm.fsf@smart-cactus.org> References: <3E39E8B5-89C2-40F6-9180-C6D73AF3926F@cis.upenn.edu> <87si6y1v30.fsf@gmail.com> <87oahlksnm.fsf@smart-cactus.org> Message-ID: I think some of my idea was misunderstood here: my goal was to have quick releases only from the stable branch. The goal would not be to release the new and shiny, but instead to get bugfixes out to users quicker. The new and shiny (master) would remain as it is now. In other words: more users would be affected by this change than just the vanguard. Richard On Sep 2, 2015, at 3:43 AM, Ben Gamari wrote: > Richard Eisenberg writes: > >> On Sep 1, 2015, at 12:01 AM, Herbert Valerio Riedel wrote: >> >>> I'd say mostly organisational overhead which can't be fully automated >>> (afaik, Ben has already automated large parts but not everything can be): >>> >>> - Coordinating with people creating and testing the bindists >> >> This was the sort of thing I thought could be automated. I'm picturing >> a system where Austin/Ben hits a button and everything whirs to life, >> creating, testing, and posting bindists, with no people involved. >> > I can nearly do this for Linux with my existing tools. I can do 32- and > 64-bit builds for both RedHat and Debian all on a single > Debian 8 machine with the tools I developed during the course of the > 7.10.2 release [1]. > > Windows is unfortunately still a challenge. I did the 7.10.2 builds on > an EC2 instance and the experience wasn't terribly fun. I would love for > this to be further automated but I've not done this yet. > >>> - Writing releases notes & announcment >> >> Release notes should, theoretically, be updated with the patches. >> Announcement can be automated. >> > If I'm doing my job well the release notes shouldn't be a problem. I've > been trying to be meticulous about ensuring that all new features come > with acceptable release notes. > >>> - If bundled core-libraries are affected, coordination overhead with package >>> maintainers (unless GHC HQ owned), verifying version bumps (API diff!) and >>> changelogs have been updated accordingly, uploading to Hackage >> >> Any library version change would require a more proper release. Do >> these libraries tend to change during a major release cycle? >> > The core libraries are perhaps the trickiest part of this. Currently the > process goes something like this, > > 1. We branch off a stable GHC release > 2. Development continues on `master`, eventually a breaking change is > merged to one of the libraries > 3. Eventually someone notices and bumps the library's version > 4. More breaking changes are merged to the library > 5. We branch off for another stable release, right before the release > we manually push the libraries to Hackage > 6. Repeat from (2) > > There can potentially be a lot of interface churn between steps 3 and 5. > If we did releases in this period we would need to be much more careful > about library versioning. I suspect this may end up being quite a bit of > work to do properly. > > Technically we could punt on this problem and just do the same sort of > stable/unstable versioning for the libraries that we already do with GHC > itself. This would mean, however, that we couldn't upload the libraries > to Hackage. > >>> - Uploading and signing packagees to download.haskell.org, and verifying >>> the downloads >> >> This isn't automated? >> > It is now (see [2]). This shouldn't be a problem. > >>> Austin & Ben probably have more to add to this list >>> >> I'm sure they do. >> >> Again, I'd be fine if the answer from the community is "it's just not >> what we need". But I wanted to see if there were >> technical/practical/social reasons why this was or wasn't a good idea. >> If we do think it's a good idea absent those reasons, then we can work >> on addressing those concerns. >> > Technically I think there are no reasons why this isn't feasible with > some investment. Exactly how much investment depends upon what > exactly we want to achieve, > > * How often do we make these releases? > * Which platforms do we support? > * How carefully do we version included libraries? > > If we focus solely on Linux and punt on the library versioning issue I > would say this wouldn't even difficult. I could easily setup my build > machine to do a nightly bindist and push it to a server somewhere. > Austin has also mentioned that Harbormaster builds could potentially > produce bindists. > > The question is whether users want more rapid releases. Those working on > GHC will use their own builds. Most users want something reasonably > stable (in both the interface sense and the reliability sense) and > therefore I suspect would stick with the releases. This leaves a > relatively small number of potential users; namely those who want to > play around with unreleased features yet aren't willing to do their own > builds. > > Cheers, > > - Ben > > > [1] https://github.com/bgamari/ghc-utils > [2] https://github.com/bgamari/ghc-utils/blob/master/rel-eng/upload.sh From ben at well-typed.com Wed Sep 2 16:04:33 2015 From: ben at well-typed.com (Ben Gamari) Date: Wed, 02 Sep 2015 18:04:33 +0200 Subject: more releases In-Reply-To: References: <3E39E8B5-89C2-40F6-9180-C6D73AF3926F@cis.upenn.edu> <87si6y1v30.fsf@gmail.com> <87oahlksnm.fsf@smart-cactus.org> Message-ID: <87si6wkdta.fsf@smart-cactus.org> Richard Eisenberg writes: > I think some of my idea was misunderstood here: my goal was to have > quick releases only from the stable branch. The goal would not be to > release the new and shiny, but instead to get bugfixes out to users > quicker. The new and shiny (master) would remain as it is now. In > other words: more users would be affected by this change than just the > vanguard. > I see. This is something we could certainly do. It would require, however, that we be more pro-active about continuing to merge things to the stable branch after the release. Currently the stable branch is essentially in the same state that it was in for the 7.10.2 release. I've left it this way as it takes time and care to cherry-pick patches to stable. Thusfar my poilcy has been to perform this work lazily until it's clear that we will do another stable release as otherwise the effort may well be wasted. So, even if the steps of building, testing, and uploading the release are streamlined more frequent releases are still far from free. Whether it's a worthwhile cost I don't know. This is a difficult question to answer without knowing more about how typical users actually acquire GHC. For instance, this effort would have minimal impact on users who get their compiler through their distribution's package manager. On the other hand, if most users download GHC bindists directly from the GHC download page, then perhaps this would be effort well-spent. Cheers, - Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 472 bytes Desc: not available URL: From michael at diglumi.com Wed Sep 2 16:26:51 2015 From: michael at diglumi.com (Michael Smith) Date: Wed, 02 Sep 2015 16:26:51 +0000 Subject: Shared data type for extension flags In-Reply-To: References: Message-ID: The package description for that is "The GHC compiler's view of the GHC package database format", and this doesn't really have to do with the package database format. Would it be okay to put this in there anyway? On Wed, Sep 2, 2015, 07:33 Simon Peyton Jones wrote: > we already have such a shared library, I think: bin-package-db. would > that do? > > > > Simon > > > > *From:* ghc-devs [mailto:ghc-devs-bounces at haskell.org] *On Behalf Of *Michael > Smith > *Sent:* 02 September 2015 09:21 > *To:* Matthew Pickering > *Cc:* GHC developers > *Subject:* Re: Shared data type for extension flags > > > > That sounds like a good approach. Are there other things that would go > nicely > in a shared package like this, in addition to the extension data type? > > > > On Wed, Sep 2, 2015 at 1:00 AM, Matthew Pickering < > matthewtpickering at gmail.com> wrote: > > Surely the easiest way here (including for other tooling - ie > haskell-src-exts) is to create a package which just provides this > enumeration. GHC, cabal, th, haskell-src-exts and so on then all > depend on this package rather than creating their own enumeration. > > > On Wed, Sep 2, 2015 at 9:47 AM, Michael Smith wrote: > > #10820 on Trac [1] and D1200 on Phabricator [2] discuss adding the > > capababilty > > to Template Haskell to detect which language extensions enabled. > > Unfortunately, > > since template-haskell can't depend on ghc (as ghc depends on > > template-haskell), > > it can't simply re-export the ExtensionFlag type from DynFlags to the > user. > > > > There is a second data type encoding the list of possible language > > extensions in > > the Cabal package, in Language.Haskell.Extension [3]. But > template-haskell > > doesn't already depend on Cabal, and doing so seems like it would cause > > difficulties, as the two packages can be upgraded separately. > > > > So adding this new feature to Template Haskell requires introducing a > > *third* > > data type for language extensions. It also requires enumerating this full > > list > > in two more places, to convert back and forth between the TH Extension > data > > type > > and GHC's internal ExtensionFlag data type. > > > > Is there another way here? Can there be one single shared data type for > this > > somehow? > > > > [1] https://ghc.haskell.org/trac/ghc/ticket/10820 > > [2] https://phabricator.haskell.org/D1200 > > [3] > > > https://hackage.haskell.org/package/Cabal-1.22.4.0/docs/Language-Haskell-Extension.html > > > > > _______________________________________________ > > ghc-devs mailing list > > ghc-devs at haskell.org > > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.zimm at gmail.com Wed Sep 2 18:39:33 2015 From: alan.zimm at gmail.com (Alan & Kim Zimmerman) Date: Wed, 2 Sep 2015 20:39:33 +0200 Subject: Shared data type for extension flags In-Reply-To: References: Message-ID: Would this be a feasible approach for harmonising the AST between GHC and TH too? Alan On 2 Sep 2015 09:27, "Michael Smith" wrote: > The package description for that is "The GHC compiler's view of the GHC > package database format", and this doesn't really have to do with the > package database format. Would it be okay to put this in there anyway? > > On Wed, Sep 2, 2015, 07:33 Simon Peyton Jones > wrote: > >> we already have such a shared library, I think: bin-package-db. would >> that do? >> >> >> >> Simon >> >> >> >> *From:* ghc-devs [mailto:ghc-devs-bounces at haskell.org] *On Behalf Of *Michael >> Smith >> *Sent:* 02 September 2015 09:21 >> *To:* Matthew Pickering >> *Cc:* GHC developers >> *Subject:* Re: Shared data type for extension flags >> >> >> >> That sounds like a good approach. Are there other things that would go >> nicely >> in a shared package like this, in addition to the extension data type? >> >> >> >> On Wed, Sep 2, 2015 at 1:00 AM, Matthew Pickering < >> matthewtpickering at gmail.com> wrote: >> >> Surely the easiest way here (including for other tooling - ie >> haskell-src-exts) is to create a package which just provides this >> enumeration. GHC, cabal, th, haskell-src-exts and so on then all >> depend on this package rather than creating their own enumeration. >> >> >> On Wed, Sep 2, 2015 at 9:47 AM, Michael Smith >> wrote: >> > #10820 on Trac [1] and D1200 on Phabricator [2] discuss adding the >> > capababilty >> > to Template Haskell to detect which language extensions enabled. >> > Unfortunately, >> > since template-haskell can't depend on ghc (as ghc depends on >> > template-haskell), >> > it can't simply re-export the ExtensionFlag type from DynFlags to the >> user. >> > >> > There is a second data type encoding the list of possible language >> > extensions in >> > the Cabal package, in Language.Haskell.Extension [3]. But >> template-haskell >> > doesn't already depend on Cabal, and doing so seems like it would cause >> > difficulties, as the two packages can be upgraded separately. >> > >> > So adding this new feature to Template Haskell requires introducing a >> > *third* >> > data type for language extensions. It also requires enumerating this >> full >> > list >> > in two more places, to convert back and forth between the TH Extension >> data >> > type >> > and GHC's internal ExtensionFlag data type. >> > >> > Is there another way here? Can there be one single shared data type for >> this >> > somehow? >> > >> > [1] https://ghc.haskell.org/trac/ghc/ticket/10820 >> > [2] https://phabricator.haskell.org/D1200 >> > [3] >> > >> https://hackage.haskell.org/package/Cabal-1.22.4.0/docs/Language-Haskell-Extension.html >> > >> >> > _______________________________________________ >> > ghc-devs mailing list >> > ghc-devs at haskell.org >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> > >> >> >> > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marlowsd at gmail.com Wed Sep 2 18:51:38 2015 From: marlowsd at gmail.com (Simon Marlow) Date: Wed, 2 Sep 2015 11:51:38 -0700 Subject: Proposal: accept pull requests on GitHub In-Reply-To: References: Message-ID: <55E7453A.90309@gmail.com> On 01/09/2015 11:34, Thomas Miedema wrote: > Hello all, > > my arguments against Phabricator are here: > https://ghc.haskell.org/trac/ghc/wiki/WhyNotPhabricator. Thanks for taking the time to summarize all the issues. Personally, I think github's support for code reviews is too weak to recommend it over Phabricator. The multiple-email problem is a killer all by itself. We can improve the workflow for Phabricator to address some of the issues you raise are fixable, such as fixing the base revision to use, and ignoring untracked files (these are local settings, I believe). Stacks of commits are hard to reviewers to follow, so making them easier might have a detrimental effect on our processes. It might feel better for the author, but discovering what changed between two branches of multiple commits on github is almost impossible. Instead the recommended workflow seems to be to add more commits, which makes the history harder to read later. I have only had to update my arc once. Is that a big problem? Cheers Simon > Some quotes from #ghc to pique your curiosity (there are some 50 more): > * "is arc broken today?" > * "arc is a frickin' mystery." > * "i have a theory that i've managed to create a revision that phab > can't handle." > * "Diffs just seem to be too expensive to create ... I can't blame > contributors for not wanting to do this for every atomic change" > * "but seriously, we can't require this for contributing to GHC... the > entry barrier is already high enough" > > GitHub has side-by-side diffs > nowadays, and > Travis-CI can run `./validate --fast` comfortably > . > > *Proposal: accept pull requests from contributors on > https://github.com/ghc/ghc.* > > Details: > * use Travis-CI to validate pull requests. > * keep using the Trac issue tracker (contributors are encouraged to > put a link to their pull-request in the 'Differential Revisions' field). > * keep using the Trac wiki. > * in discussions on GitHub, use https://ghc.haskell.org/ticket/1234 to > refer to Trac ticket 1234. The shortcut #1234 only works on Trac itself. > * keep pushing to git.haskell.org , where the > existing Git receive hooks can do their job keeping tabs, trailing > whitespace and dangling submodule references out, notify Trac and send > emails. Committers close pull-requests manually, just like they do Trac > tickets. > * keep running Phabricator for as long as necessary. > * mention that pull requests are accepted on > https://ghc.haskell.org/trac/ghc/wiki/WorkingConventions/FixingBugs. > > My expectation is that the majority of patches will start coming in via > pull requests, the number of contributions will go up, commits will be > smaller, and there will be more of them per pull request (contributors > will be able to put style changes and refactorings into separate > commits, without jumping through a bunch of hoops). > > Reviewers will get many more emails. Other arguments against GitHub are > here: https://ghc.haskell.org/trac/ghc/wiki/WhyNotGitHub. > > I probably missed a few things, so fire away. > > Thanks, > Thomas > > > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > From tuncer.ayaz at gmail.com Wed Sep 2 19:21:00 2015 From: tuncer.ayaz at gmail.com (Tuncer Ayaz) Date: Wed, 2 Sep 2015 21:21:00 +0200 Subject: Proposal: accept pull requests on GitHub In-Reply-To: <55E7453A.90309@gmail.com> References: <55E7453A.90309@gmail.com> Message-ID: On Wed, Sep 2, 2015 at 8:51 PM, Simon Marlow wrote: > Stacks of commits are hard to reviewers to follow, so making them > easier might have a detrimental effect on our processes. It might > feel better for the author, but discovering what changed between two > branches of multiple commits on github is almost impossible. Instead > the recommended workflow seems to be to add more commits, which > makes the history harder to read later. I've reviewed+merged various big diffs in the form of branches published as pull requests (on and off GitHub), and being able to see each change separately with its own commit message was way easier than one big diff with a summarized message. If Phabricator would use merge commits, reading multi-commit history, especially what commits got merged together (aka what branch was integrated), is easy. Also, bisecting is more precise without collapsed diffs. Therefore, I wouldn't say the single-commit collapsed view is the right choice for all diffs. From michael at diglumi.com Wed Sep 2 19:33:24 2015 From: michael at diglumi.com (Michael Smith) Date: Wed, 02 Sep 2015 19:33:24 +0000 Subject: Shared data type for extension flags In-Reply-To: References: Message-ID: I don't know about the entire AST. GHC's AST contains a lot of complexity that one wouldn't want to expose at the TH level. And the separation allows GHC to change the internal AST around while maintaining a stable interface for packages depending on TH. That said, there are some bits that I could see being shared. Fixity and Strict from TH come to mind. On Wed, Sep 2, 2015, 11:39 Alan & Kim Zimmerman wrote: > Would this be a feasible approach for harmonising the AST between GHC and > TH too? > > Alan > On 2 Sep 2015 09:27, "Michael Smith" wrote: > >> The package description for that is "The GHC compiler's view of the GHC >> package database format", and this doesn't really have to do with the >> package database format. Would it be okay to put this in there anyway? >> >> On Wed, Sep 2, 2015, 07:33 Simon Peyton Jones >> wrote: >> >>> we already have such a shared library, I think: bin-package-db. would >>> that do? >>> >>> >>> >>> Simon >>> >>> >>> >>> *From:* ghc-devs [mailto:ghc-devs-bounces at haskell.org] *On Behalf Of *Michael >>> Smith >>> *Sent:* 02 September 2015 09:21 >>> *To:* Matthew Pickering >>> *Cc:* GHC developers >>> *Subject:* Re: Shared data type for extension flags >>> >>> >>> >>> That sounds like a good approach. Are there other things that would go >>> nicely >>> in a shared package like this, in addition to the extension data type? >>> >>> >>> >>> On Wed, Sep 2, 2015 at 1:00 AM, Matthew Pickering < >>> matthewtpickering at gmail.com> wrote: >>> >>> Surely the easiest way here (including for other tooling - ie >>> haskell-src-exts) is to create a package which just provides this >>> enumeration. GHC, cabal, th, haskell-src-exts and so on then all >>> depend on this package rather than creating their own enumeration. >>> >>> >>> On Wed, Sep 2, 2015 at 9:47 AM, Michael Smith >>> wrote: >>> > #10820 on Trac [1] and D1200 on Phabricator [2] discuss adding the >>> > capababilty >>> > to Template Haskell to detect which language extensions enabled. >>> > Unfortunately, >>> > since template-haskell can't depend on ghc (as ghc depends on >>> > template-haskell), >>> > it can't simply re-export the ExtensionFlag type from DynFlags to the >>> user. >>> > >>> > There is a second data type encoding the list of possible language >>> > extensions in >>> > the Cabal package, in Language.Haskell.Extension [3]. But >>> template-haskell >>> > doesn't already depend on Cabal, and doing so seems like it would cause >>> > difficulties, as the two packages can be upgraded separately. >>> > >>> > So adding this new feature to Template Haskell requires introducing a >>> > *third* >>> > data type for language extensions. It also requires enumerating this >>> full >>> > list >>> > in two more places, to convert back and forth between the TH Extension >>> data >>> > type >>> > and GHC's internal ExtensionFlag data type. >>> > >>> > Is there another way here? Can there be one single shared data type >>> for this >>> > somehow? >>> > >>> > [1] https://ghc.haskell.org/trac/ghc/ticket/10820 >>> > [2] https://phabricator.haskell.org/D1200 >>> > [3] >>> > >>> https://hackage.haskell.org/package/Cabal-1.22.4.0/docs/Language-Haskell-Extension.html >>> > >>> >>> > _______________________________________________ >>> > ghc-devs mailing list >>> > ghc-devs at haskell.org >>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >>> > >>> >>> >>> >> >> _______________________________________________ >> ghc-devs mailing list >> ghc-devs at haskell.org >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From _deepfire at feelingofgreen.ru Wed Sep 2 20:42:30 2015 From: _deepfire at feelingofgreen.ru (Kosyrev Serge) Date: Wed, 02 Sep 2015 23:42:30 +0300 Subject: Proposal: accept pull requests on GitHub In-Reply-To: <55E7453A.90309@gmail.com> (sfid-20150902_231247_674400_122691D5) (Simon Marlow's message of "Wed, 2 Sep 2015 11:51:38 -0700") References: <55E7453A.90309@gmail.com> Message-ID: <87mvx4mu2x.fsf@andromedae.feelingofgreen.ru> Simon Marlow writes: > On 01/09/2015 11:34, Thomas Miedema wrote: >> Hello all, >> >> my arguments against Phabricator are here: >> https://ghc.haskell.org/trac/ghc/wiki/WhyNotPhabricator. > > Thanks for taking the time to summarize all the issues. > > Personally, I think github's support for code reviews is too weak to recommend it > over Phabricator. The multiple-email problem is a killer all by itself. As a wild idea -- did anyone look at /Gitlab/ instead? I didn't look into its review functionality to any meaninful degree, but: - it largely tries to replicate the Github workflow - Gitlab CE is open source - it evolves fairly quickly -- ? ???????e? / respectfully, ??????? ?????? -- ?And those who were seen dancing were thought to be insane by those who could not hear the music.? ? Friedrich Wilhelm Nietzsche From thomasmiedema at gmail.com Wed Sep 2 21:00:00 2015 From: thomasmiedema at gmail.com (Thomas Miedema) Date: Wed, 2 Sep 2015 23:00:00 +0200 Subject: Testsuite and validate changes Message-ID: All, I made the following changes today: * `make accept` now runs all tests for a single way (instead of all ways) * `make test` now runs all tests for a single way (instead of all ways) * `./validate` now runs all tests for a single way (instead of skipping some tests) * Phabricator now runs all tests for a single way (instead of skipping some tests) You can run `make slowtest` in the root directory, or `make slow` in the testsuite directory, to get the old behavior of `make test` back. More information: * https://ghc.haskell.org/trac/ghc/wiki/Building/RunningTests/Running#Speedsettings * https://phabricator.haskell.org/D1178 * Note [validate and testsuite speed] in the toplevel Makefile Thanks, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at nh2.me Wed Sep 2 21:09:06 2015 From: mail at nh2.me (=?UTF-8?B?TmlrbGFzIEhhbWLDvGNoZW4=?=) Date: Wed, 02 Sep 2015 23:09:06 +0200 Subject: Proposal: accept pull requests on GitHub In-Reply-To: <87mvx4mu2x.fsf@andromedae.feelingofgreen.ru> References: <55E7453A.90309@gmail.com> <87mvx4mu2x.fsf@andromedae.feelingofgreen.ru> Message-ID: <55E76572.3050405@nh2.me> On 02/09/15 22:42, Kosyrev Serge wrote: > As a wild idea -- did anyone look at /Gitlab/ instead? Hi, yes. It does not currently have a sufficient review functionality (cannot handle multiple revisions easily). On 02/09/15 20:51, Simon Marlow wrote: > It might feel better > for the author, but discovering what changed between two branches of > multiple commits on github is almost impossible. I disagree with the first part of this: When the UI of the review tool is good, it is easy to follow. But there's no open-source implementation of that around. I agree that it is not easy to follow on Github. From rf at rufflewind.com Wed Sep 2 22:42:06 2015 From: rf at rufflewind.com (Phil Ruffwind) Date: Wed, 2 Sep 2015 18:42:06 -0400 Subject: Foreign calls and periodic alarm signals Message-ID: TL;DR: Does 'foreign import safe' silence the periodic alarm signals? I received a report on this rather strange bug in 'directory': https://github.com/haskell/directory/issues/35#issuecomment-136890912 I've concluded based on the dtruss log that it's caused by the timer signal that the GHC runtime emits. Somewhere inside the guts of 'realpath' on Mac OS X, there is a function that does the moral equivalent of: while (statfs64(?) && errno == EINTR); On a slow filesystem like SSHFS, this can cause a permanent hang from the barrage of signals. The reporter found that using 'foreign import safe' mitigates the issue. What I'm curious mainly is that: is something that the GHC runtime guarantees -- is using 'foreign import safe' assured to turn off the periodic signals for that thread? I tried reading this article [1], which seems to be the only documentation I could find about this, and it didn't really go into much depth about them. (I also couldn't find any info about how frequently they occur, on which threads they occur, or which specific signal it uses.) I'm also concerned whether there are other foreign functions out in the wild that could suffer the same bug, but remain hidden because they normally complete before the next alarm signal. [1]: https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Signals From allbery.b at gmail.com Thu Sep 3 00:10:29 2015 From: allbery.b at gmail.com (Brandon Allbery) Date: Wed, 2 Sep 2015 20:10:29 -0400 Subject: [Haskell-cafe] Foreign calls and periodic alarm signals In-Reply-To: <20150902235620.7FFD7F3936@mail.avvanta.com> References: <20150902235620.7FFD7F3936@mail.avvanta.com> Message-ID: On Wed, Sep 2, 2015 at 7:56 PM, Donn Cave wrote: > Sure are, though I don't know of any that have been identified so > directly as yours. I mean it sounds like you know where and how it's > breaking. Usually we just know something's dying on an interrupt and > then think to try turning off the signal barrage. It's interesting > that you're getting a stall instead, due to an EINTR loop. > network is moderately infamous for (formerly?) using unsafe calls that block.... -- brandon s allbery kf8nh sine nomine associates allbery.b at gmail.com ballbery at sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net -------------- next part -------------- An HTML attachment was scrubbed... URL: From jan.stolarek at p.lodz.pl Thu Sep 3 03:57:59 2015 From: jan.stolarek at p.lodz.pl (Jan Stolarek) Date: Thu, 3 Sep 2015 05:57:59 +0200 Subject: HEADS UP: interface file format change, full rebuild required Message-ID: <201509030557.59638.jan.stolarek@p.lodz.pl> I just pushed injective type families patch, which changes interface file format. Full rebuild of GHC is required after you pull. Jan From austin at well-typed.com Thu Sep 3 04:41:46 2015 From: austin at well-typed.com (Austin Seipp) Date: Wed, 2 Sep 2015 23:41:46 -0500 Subject: Proposal: accept pull requests on GitHub In-Reply-To: <55E76572.3050405@nh2.me> References: <55E7453A.90309@gmail.com> <87mvx4mu2x.fsf@andromedae.feelingofgreen.ru> <55E76572.3050405@nh2.me> Message-ID: (JFYI: I hate to announce my return with a giant novel of negative-nancy-ness about a proposal that just came up. I'm sorry about this!) TL;DR: I'm strongly -1 on this, because I think it introduces a lot of associated costs for everyone, the benefits aren't really clear, and I think it obscures the real core issue about "how do we get more contributors" and how to make that happen. Needless to say, GitHub does not magically solve both of these AFAICS. As is probably already widely known, I'm fairly against GitHub because I think at best its tools are mediocre and inappropriate for GHC - but I also don't think this proposal or the alternatives stemming from it are very good, and that it reduces visibility of the real, core complaints about what is wrong. Some of those problems may be with Phabricator, but it's hard to sort the wheat from the chaff, so to speak. For one, having two code review tools of any form is completely bonkers, TBQH. This is my biggest 'obvious' blocker. If we're going to switch, we should just switch. Having to have people decide how to contribute with two tools is as crazy as having two VCSs and just a way of asking people to get *more* confused, and have us answer more questions. That's something we need to avoid. For the same reason, I'm also not a fan of 'use third party thing to augment other thing to remove its deficiencies making it OK', because the problem is _it adds surface area_ and other problems in other cases. It is a solution that should be considered a last resort, because it is a logical solution that applies to everything. If we have a bot that moves GH PRs into Phab and then review them there, the surface area of what we have to maintain and explain has suddenly exploded: because now instead of 1 thing we have 3 things (GH, Phab, bot) and the 3 interactions between them, for a multiplier of *six* things we have to deal with. And then we use reviewable,io, because GH reviews are terrible, adding a 4th mechanism? It's rube goldberg-ian. We can logically 'automate' everything in all ways to make all contributors happy, but there's a real *cognitive* overhead to this and humans don't scale as well as computers do. It is not truly 'automated away' if the cognitive burden is still there. I also find it extremely strange to tell people "By the way, this method in which you've contributed, as was requested by community members, is actually a complete proxy for the real method of contributing, you can find all your imported code here". How is this supposed to make contribution *easier* as opposed to just more confusing? Now you've got the impression you're using "the real thing" when in reality it's shoved off somewhere else to have the nitpicking done. Just using Phabricator would be less complicated, IMO, and much more direct. The same thing goes for reviewable.io. Adding it as a layer over GitHub just makes the surface area larger, and puts less under our control. And is it going to exist in the same form in 2 or 3 years? Will it continue to offer the same tools, the same workflows that we "like", and what happens when we hit a wall? It's easy to say "probably" or "sure" to all this, until we hit something we dislike and have no possibility of fixing. And once you do all this, BTW, you can 'never go back'. It seems so easy to just say 'submit pull requests' once and nothing else, right? Wrong. Once you commit to that infrastructure, it is *there* and simply taking it out from under the feet of those using it is not only unfortunate, it is *a huge timesink to undo it all*. Which amounts to it never happening. Oh, but you can import everything elsewhere! The problem is you *can't* import everything, but more importantly you can't *import my memories in another way*, so it's a huge blow to contributors to ask them about these mental time sinks, then to forget them all. And as your project grows, this becomes more of a memory as you made a first and last choice to begin with. Phabricator was 'lucky' here because it had the gateway into being the first review tool for us. But that wasn't because it was *better* than GitHub. It was because we were already using it, and it did not interact badly with our other tools or force us to compromise things - so the *cost* was low. The cost is immeasurably higher by default against GitHub because of this, at least to me. That's just how it is sometimes. Keep in mind there is a cost to everything and how you fix it. GitHub is not a simple patch to add a GHC feature. It is a question that fundamentally concerns itself with the future of the project for a long time. The costs must be analyzed more aggressively. Again, Phabricator had 'first child' preferential treatment. That's not something we can undo now. I know this sounds like a lot of ad hoc mumbo jumbo, but please bear with me: we need to identify the *root issue* here to fix it. Otherwise we will pay for the costs of an improper fix for a long time, and we are going to keep having this conversation over, and over again. And we need to weigh in the cost of fixing it, which is why I mention that so much. So with all this in mind, you're back to just using GitHub. But again GitHub is quite mediocre at best. So what is the point of all this? It's hinted at here: > the number of contributions will go up, commits will be smaller, and there will be more of them per pull request (contributors will be able to put style changes and refactorings into separate commits, without jumping through a bunch of hoops). The real hint is that "the number of contributions will go up". That's a noble goal and I think it's at the heart of this proposal. Here's the meat of it question: what is the cost of achieving this goal? That is, what amount of work is sufficient to make this goal realizable, and finally - why is GitHub *the best use of our time for achieving this?* That's one aspect of the cost - that it's the best use of the time. I feel like this is fundamentally why I always seem to never 'get' this argument, and I'm sure it's very frustrating on behalf of the people who have talked to me about it and like GitHub. But I feel like I've never gotten a straight answer for GHC. If the goal is actually "make more people contribute", that's pretty broad. I can make that very easy: give everyone who ever submits a patch push access. This is a legitimate way to run large projects that has worked. People will almost certainly be more willing to commit, especially when overhead on patch submission is reduced so much. Why not just do that instead? It's not like we even mandate code review, although we could. You could reasonably trust CI to catch and revert things a lot of the time for people who commit directly to master. We all do it sometimes. I'm being serious about this. I can start doing that tomorrow because the *cost is low*, both now and reasonably speaking into some foreseeable future. It is one of many solutions to raw heart of the proposal. GitHub is not a low cost move, but also, it is a *long term cost* because of the technical deficiencies it won't aim to address (merge commits are ugly, branch reviews are weak, ticket/PR namespace overlaps with Trac, etc etc) or that we'll have to work around. That means that if we want GitHub to fix the "give us more contributors" problem, and it has a high cost, it not only has _to fix the problem_, it also has to do that well enough to offset its cost. I don't think it's clear that is the case right now, among a lot of other solutions. I don't think the root issue is "We _need_ GitHub to get more contributors". It sounds like the complaint is more "I don't like how Phabricator works right now". That's an important distinction, because the latter is not only more specific, it's more actionable: - Things like Arcanist can be tracked as a Git submodule. There is little to no pain in this, it's low cost, and it can always be synchronized with Phabricator. This eliminates the "Must clone arcanist" and "need to upgrade arcanist" points. - Similarly when Phabricator sometimes kills a lot of builds, it's because I do an upgrade. That's mostly an error on my part and I can simply schedule upgrades regularly, barring hotfixes or somesuch. That should basically eliminate these. The other build issues are from picking the wrong base commit from the revision, I think, which I believe should be fixable upstream (I need to get a solid example of one that isn't a mega ultra patch.) - If Harbormaster is not building dependent patches as mentioned in WhyNotPhabricator, that is a bug, and I have not been aware of it. Please make me aware of it so I can file bugs! I seriously don't look at _every_ patch, I need to know this. That could have probably been fixed ASAP otherwise. - We can get rid of the awkwardness of squashes etc by using Phabricator's "immutable" history, although it introduces merge commits. Whether this is acceptable is up to debate (I dislike merge commits, but could live with it). - I do not understand point #3, about answering questions. Here's the reality: every single one of those cases is *almost always an error*. That's not a joke. Forgetting to commit a file, amending changes in the working tree, and specifying a reviewer are all total errors as it stands today. Why is this a minus? It catches a useful class of 'interaction bugs'. If it's because sometimes Phabricator yells about build arifacts in the tree, those should be .gitignore'd. If it's because you have to 'git stash' sometimes, this is fairly trivial IMO. Finally, specifying reviewers IS inconvenient, but currently needed. We could easily assign a '#reviewers' tag that would add default reviewers. - In the future, Phabricator will hopefully be able to automatically assign the right reviewers to every single incoming patch, based on the source file paths in the tree, using the Owners tool. Technically, we could do that today if we wanted, it's just a little more effort to add more Herald rules. This will be far, far more robust than anything GitHub can offer, and eliminates point #3. - Styling, linting etc errors being included, because reviews are hard to create: This is tangential IMO. We need to just bite the bullet on this and settle on some lint and coding styles, and apply them to the tree uniformly. The reality is *nobody ever does style changes on their own*, and they are always accompanied by a diff, and they always have to redo the work of pulling them out, Phab or not. Literally 99% of the time we ask for this, it happens this way. Perhaps instead we should just eliminate this class of work by just running linters over all of the source code at once, and being happy with it. Doing this in fact has other benefits: like `arc lint` will always _correctly_ report when linting errors are violated. And we can reject patches that violate them, because they will always be accurate. - As for some of the quotes, some of them are funny, but the real message lies in the context. :) In particular, there have been several cases (such as the DWARF work) where the idea was "write 30 commits and put them on Phabricator". News flash: *this is bad*, no matter whether you're using Phabricator or not, because it makes reviewing the whole thing immensely difficult from a reviewer perspective. The point here is that we can clear this up by being more communicative about what we expect of authors of large patches, and communicating your intent ASAP so we can get patches in as fast as possible. Writing a patch is the easiest part of the work. And more: - Clean up the documentation, it's a mess. It feels nice that everything has clear, lucid explanations on the wiki, but the wiki is ridiculously massive and we have a tendancy for 'link creep' where we spread things out. The contributors docs could probably stand to be streamlined. We would have to do this anyway, moving to GitHub or not. - Improve the homepage, directly linking to this aforementioned page. - Make it clear what we expect of contributors. I feel like a lot of this could be explained by having a 5 minute drive-by guide for patches, and then a longer 10-minute guide about A) How to style things, B) How to format your patches if you're going to contribute regularly, C) Why it is this way, and D) finally links to all the other things you need to know. People going into Phabricator expecting it to behave like GitHub is a problem (more a cultural problem IMO but that's another story), and if this can't be directly fixed, the best thing to do is make it clear why it isn't. Those are just some of the things OTTOMH, but this email is already way too long. This is what I mean though: fixing most of these is going to have *seriously smaller cost* than moving to GitHub. It does not account for "The GitHub factor" of people contributing "just because it's on GitHub", but again, that value has to outweigh the other costs. I'm not seriously convinced it does. I know it's work to fix these things. But GitHub doesn't really magically make a lot of our needs go away, and it's not going to magically fix things like style or lint errors, the fact Travis-CI is still pretty insufficient for us in the long term (and Harbormaster is faster, on our own hardware, too), or that it will cause needlessly higher amounts of spam through Trac and GitHub itself. I don't think settling on it as - what seems to be - a first resort, is a really good idea. On Wed, Sep 2, 2015 at 4:09 PM, Niklas Hamb?chen wrote: > On 02/09/15 22:42, Kosyrev Serge wrote: >> As a wild idea -- did anyone look at /Gitlab/ instead? > > Hi, yes. It does not currently have a sufficient review functionality > (cannot handle multiple revisions easily). > > On 02/09/15 20:51, Simon Marlow wrote: >> It might feel better >> for the author, but discovering what changed between two branches of >> multiple commits on github is almost impossible. > > I disagree with the first part of this: When the UI of the review tool > is good, it is easy to follow. But there's no open-source implementation > of that around. > > I agree that it is not easy to follow on Github. > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > -- Regards, Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/ From austin at well-typed.com Thu Sep 3 04:46:10 2015 From: austin at well-typed.com (Austin Seipp) Date: Wed, 2 Sep 2015 23:46:10 -0500 Subject: HEADS UP: interface file format change, full rebuild required In-Reply-To: <201509030557.59638.jan.stolarek@p.lodz.pl> References: <201509030557.59638.jan.stolarek@p.lodz.pl> Message-ID: A long time coming. Congratulations! On Wed, Sep 2, 2015 at 10:57 PM, Jan Stolarek wrote: > I just pushed injective type families patch, which changes interface file format. Full rebuild of > GHC is required after you pull. > > Jan > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > -- Regards, Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/ From michael at diglumi.com Thu Sep 3 05:03:31 2015 From: michael at diglumi.com (Michael Smith) Date: Wed, 2 Sep 2015 22:03:31 -0700 Subject: Proposal: accept pull requests on GitHub In-Reply-To: References: <55E7453A.90309@gmail.com> <87mvx4mu2x.fsf@andromedae.feelingofgreen.ru> <55E76572.3050405@nh2.me> Message-ID: On Wed, Sep 2, 2015 at 9:41 PM, Austin Seipp wrote: > - Make it clear what we expect of contributors. I feel like a lot of > this could be explained by having a 5 minute drive-by guide for > patches, and then a longer 10-minute guide about A) How to style > things, B) How to format your patches if you're going to contribute > regularly, C) Why it is this way, and D) finally links to all the > other things you need to know. People going into Phabricator expecting > it to behave like GitHub is a problem (more a cultural problem IMO but > that's another story), and if this can't be directly fixed, the best > thing to do is make it clear why it isn't. > This is tangential to the issue of the code review system, and I don't want to derail the discussion here, but if you're talking about a drive-by guide for patches, I'd add E) straightforward instructions on how to get GHC building *fast* for development. A potential contributor won't even reach the patch submission stage if they can't get the build system set up properly, and the current documentation here is spread out and somewhat intimidating for a newcomer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eir at cis.upenn.edu Thu Sep 3 06:27:16 2015 From: eir at cis.upenn.edu (Richard Eisenberg) Date: Wed, 2 Sep 2015 23:27:16 -0700 Subject: addTopDecls restrictions Message-ID: <77F076EA-745A-4340-8F3F-92030A93E81A@cis.upenn.edu> Hi Geoff, The TH addTopDecls function is restricted to only a few kinds of declarations (functions, mostly). This set has been expanded in #10486 (https://ghc.haskell.org/trac/ghc/ticket/10486). Do you remember why the set of allowed declarations is restricted? It looks to me like any declaration would be OK. Thanks! Richard From joehillen at gmail.com Thu Sep 3 07:18:03 2015 From: joehillen at gmail.com (Joe Hillenbrand) Date: Thu, 3 Sep 2015 00:18:03 -0700 Subject: Proposal: accept pull requests on GitHub In-Reply-To: <87mvx4mu2x.fsf@andromedae.feelingofgreen.ru> References: <55E7453A.90309@gmail.com> <87mvx4mu2x.fsf@andromedae.feelingofgreen.ru> Message-ID: > As a wild idea -- did anyone look at /Gitlab/ instead? My personal experience with Gitlab at a previous job is that it is extremely unstable. I'd say even more unstable than trac and phabricator. It's especially bad when dealing with long files. From michael at diglumi.com Thu Sep 3 07:22:11 2015 From: michael at diglumi.com (Michael Smith) Date: Thu, 3 Sep 2015 00:22:11 -0700 Subject: A process for reporting security-sensitive issues Message-ID: I feel there should be some process for reporting security-sensitive issues in GHC -- for example, #9562 and #10826 in Trac. Perhaps something like the SensitiveTicketsPlugin [3] could be used? [1] https://ghc.haskell.org/trac/ghc/ticket/9562 [2] https://ghc.haskell.org/trac/ghc/ticket/10826 [3] https://trac-hacks.org/wiki/SensitiveTicketsPlugin -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomasmiedema at gmail.com Thu Sep 3 09:53:40 2015 From: thomasmiedema at gmail.com (Thomas Miedema) Date: Thu, 3 Sep 2015 11:53:40 +0200 Subject: Proposal: accept pull requests on GitHub Message-ID: > > The real hint is that "the number of contributions will go up". That's > a noble goal and I think it's at the heart of this proposal. > It's not. What's at the heart of my proposal is that `arc` sucks. Most of those quotes I posted are from regular contributors (here's another one: "arcanist kinda makes stuff even more confusing than Git by itself"). Newcomers will give it their best shot, thinking it's just another thing they need to learn, thinking it's their fault for running into problems, thinking they'll get the hang of it eventually. Except they won't, or at least I haven't, after using it for over a year. Maybe the fundamental problem with Phabricator is that it doesn't understand Git well, and the problems I posted on https://ghc.haskell.org/trac/ghc/wiki/WhyNotPhabricator are just symptoms of it. I'm having trouble putting this into words though (something about branches and submodules). Perhaps someone else can? In my opinion it's is a waste of our time trying to improve `arc` (it is 34000 lines of PHP btw + another 70000 LOC for libphutil), when `pull requests` are an obvious alternative that most of the Haskell community already uses. When you're going to require contributors to use a non-standard tool to get patches to your code review system, it better just work. `arc` is clearly failing us here, and I'm saying enough is enough. I need to think about your other points. Thank you for the thorough reply. > Here's the meat of it question: what is the cost of achieving this > goal? That is, what amount of work is sufficient to make this goal > realizable, and finally - why is GitHub *the best use of our time for > achieving this?* That's one aspect of the cost - that it's the best > use of the time. I feel like this is fundamentally why I always seem > to never 'get' this argument, and I'm sure it's very frustrating on > behalf of the people who have talked to me about it and like GitHub. > But I feel like I've never gotten a straight answer for GHC. > > If the goal is actually "make more people contribute", that's pretty > broad. I can make that very easy: give everyone who ever submits a > patch push access. This is a legitimate way to run large projects that > has worked. People will almost certainly be more willing to commit, > especially when overhead on patch submission is reduced so much. Why > not just do that instead? It's not like we even mandate code review, > although we could. You could reasonably trust CI to catch and revert > things a lot of the time for people who commit directly to master. We > all do it sometimes. > > I'm being serious about this. I can start doing that tomorrow because > the *cost is low*, both now and reasonably speaking into some > foreseeable future. It is one of many solutions to raw heart of the > proposal. GitHub is not a low cost move, but also, it is a *long term > cost* because of the technical deficiencies it won't aim to address > (merge commits are ugly, branch reviews are weak, ticket/PR namespace > overlaps with Trac, etc etc) or that we'll have to work around. > > That means that if we want GitHub to fix the "give us more > contributors" problem, and it has a high cost, it not only has _to fix > the problem_, it also has to do that well enough to offset its cost. I > don't think it's clear that is the case right now, among a lot of > other solutions. > > I don't think the root issue is "We _need_ GitHub to get more > contributors". It sounds like the complaint is more "I don't like how > Phabricator works right now". That's an important distinction, because > the latter is not only more specific, it's more actionable: > > - Things like Arcanist can be tracked as a Git submodule. There is > little to no pain in this, it's low cost, and it can always be > synchronized with Phabricator. This eliminates the "Must clone > arcanist" and "need to upgrade arcanist" points. > > - Similarly when Phabricator sometimes kills a lot of builds, it's > because I do an upgrade. That's mostly an error on my part and I can > simply schedule upgrades regularly, barring hotfixes or somesuch. That > should basically eliminate these. The other build issues are from > picking the wrong base commit from the revision, I think, which I > believe should be fixable upstream (I need to get a solid example of > one that isn't a mega ultra patch.) > > - If Harbormaster is not building dependent patches as mentioned in > WhyNotPhabricator, that is a bug, and I have not been aware of it. > Please make me aware of it so I can file bugs! I seriously don't look > at _every_ patch, I need to know this. That could have probably been > fixed ASAP otherwise. > > - We can get rid of the awkwardness of squashes etc by using > Phabricator's "immutable" history, although it introduces merge > commits. Whether this is acceptable is up to debate (I dislike merge > commits, but could live with it). > > - I do not understand point #3, about answering questions. Here's > the reality: every single one of those cases is *almost always an > error*. That's not a joke. Forgetting to commit a file, amending > changes in the working tree, and specifying a reviewer are all total > errors as it stands today. Why is this a minus? It catches a useful > class of 'interaction bugs'. If it's because sometimes Phabricator > yells about build arifacts in the tree, those should be .gitignore'd. > If it's because you have to 'git stash' sometimes, this is fairly > trivial IMO. Finally, specifying reviewers IS inconvenient, but > currently needed. We could easily assign a '#reviewers' tag that would > add default reviewers. > - In the future, Phabricator will hopefully be able to > automatically assign the right reviewers to every single incoming > patch, based on the source file paths in the tree, using the Owners > tool. Technically, we could do that today if we wanted, it's just a > little more effort to add more Herald rules. This will be far, far > more robust than anything GitHub can offer, and eliminates point #3. > > - Styling, linting etc errors being included, because reviews are > hard to create: This is tangential IMO. We need to just bite the > bullet on this and settle on some lint and coding styles, and apply > them to the tree uniformly. The reality is *nobody ever does style > changes on their own*, and they are always accompanied by a diff, and > they always have to redo the work of pulling them out, Phab or not. > Literally 99% of the time we ask for this, it happens this way. > Perhaps instead we should just eliminate this class of work by just > running linters over all of the source code at once, and being happy > with it. > > Doing this in fact has other benefits: like `arc lint` will always > _correctly_ report when linting errors are violated. And we can reject > patches that violate them, because they will always be accurate. > > - As for some of the quotes, some of them are funny, but the real > message lies in the context. :) In particular, there have been several > cases (such as the DWARF work) where the idea was "write 30 commits > and put them on Phabricator". News flash: *this is bad*, no matter > whether you're using Phabricator or not, because it makes reviewing > the whole thing immensely difficult from a reviewer perspective. The > point here is that we can clear this up by being more communicative > about what we expect of authors of large patches, and communicating > your intent ASAP so we can get patches in as fast as possible. Writing > a patch is the easiest part of the work. > > And more: > > - Clean up the documentation, it's a mess. It feels nice that > everything has clear, lucid explanations on the wiki, but the wiki is > ridiculously massive and we have a tendancy for 'link creep' where we > spread things out. The contributors docs could probably stand to be > streamlined. We would have to do this anyway, moving to GitHub or not. > > - Improve the homepage, directly linking to this aforementioned page. > > - Make it clear what we expect of contributors. I feel like a lot of > this could be explained by having a 5 minute drive-by guide for > patches, and then a longer 10-minute guide about A) How to style > things, B) How to format your patches if you're going to contribute > regularly, C) Why it is this way, and D) finally links to all the > other things you need to know. People going into Phabricator expecting > it to behave like GitHub is a problem (more a cultural problem IMO but > that's another story), and if this can't be directly fixed, the best > thing to do is make it clear why it isn't. > > Those are just some of the things OTTOMH, but this email is already > way too long. This is what I mean though: fixing most of these is > going to have *seriously smaller cost* than moving to GitHub. It does > not account for "The GitHub factor" of people contributing "just > because it's on GitHub", but again, that value has to outweigh the > other costs. I'm not seriously convinced it does. > > I know it's work to fix these things. But GitHub doesn't really > magically make a lot of our needs go away, and it's not going to > magically fix things like style or lint errors, the fact Travis-CI is > still pretty insufficient for us in the long term (and Harbormaster is > faster, on our own hardware, too), or that it will cause needlessly > higher amounts of spam through Trac and GitHub itself. I don't think > settling on it as - what seems to be - a first resort, is a really > good idea. > > > On Wed, Sep 2, 2015 at 4:09 PM, Niklas Hamb?chen wrote: > > On 02/09/15 22:42, Kosyrev Serge wrote: > >> As a wild idea -- did anyone look at /Gitlab/ instead? > > > > Hi, yes. It does not currently have a sufficient review functionality > > (cannot handle multiple revisions easily). > > > > On 02/09/15 20:51, Simon Marlow wrote: > >> It might feel better > >> for the author, but discovering what changed between two branches of > >> multiple commits on github is almost impossible. > > > > I disagree with the first part of this: When the UI of the review tool > > is good, it is easy to follow. But there's no open-source implementation > > of that around. > > > > I agree that it is not easy to follow on Github. > > _______________________________________________ > > ghc-devs mailing list > > ghc-devs at haskell.org > > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > > > > > > -- > Regards, > > Austin Seipp, Haskell Consultant > Well-Typed LLP, http://www.well-typed.com/ > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tab at snarc.org Thu Sep 3 10:43:55 2015 From: tab at snarc.org (Vincent Hanquez) Date: Thu, 3 Sep 2015 11:43:55 +0100 Subject: Proposal: accept pull requests on GitHub In-Reply-To: References: Message-ID: <55E8246B.8040108@snarc.org> On 03/09/2015 10:53, Thomas Miedema wrote: > > The real hint is that "the number of contributions will go up". That's > a noble goal and I think it's at the heart of this proposal. > > > When you're going to require contributors to use a non-standard tool > to get patches to your code review system, it better just work. `arc` > is clearly failing us here, and I'm saying enough is enough. Not only this, but there's (probably) lots of small/janitorial contributions that do not need the full power of phabricator or any sophisticated code review. Not accepting github PRs and forcing everyone to go through an uncommon tool (however formidable), is quite likely to turn those contributions away IMHO. -- Vincent -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomasmiedema at gmail.com Thu Sep 3 11:48:31 2015 From: thomasmiedema at gmail.com (Thomas Miedema) Date: Thu, 3 Sep 2015 13:48:31 +0200 Subject: Proposal: accept pull requests on GitHub In-Reply-To: <55E8246B.8040108@snarc.org> References: <55E8246B.8040108@snarc.org> Message-ID: On Thu, Sep 3, 2015 at 12:43 PM, Vincent Hanquez wrote: > there's (probably) lots of small/janitorial contributions that do not need > the full power of phabricator or any sophisticated code review. > Austin's point, and I agree, is that we shouldn't optimize the system for those contributions. Cleanup, documentation and other small patches are very much welcomed, and they usually get merged within a few days. To make a truly better GHC though, we very much depend on expert contributors, say to implement and review Backpack or DWARF-based backtraces. My point is that `arc` is hurting these expert contributors as much, if not more than everyone else. To get more expert contributors you need more newcomers, but don't optimize the system only for the newcomers. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tuncer.ayaz at gmail.com Thu Sep 3 12:29:00 2015 From: tuncer.ayaz at gmail.com (Tuncer Ayaz) Date: Thu, 3 Sep 2015 14:29:00 +0200 Subject: Proposal: accept pull requests on GitHub In-Reply-To: References: <55E7453A.90309@gmail.com> <87mvx4mu2x.fsf@andromedae.feelingofgreen.ru> <55E76572.3050405@nh2.me> Message-ID: On Thu, Sep 3, 2015 at 6:41 AM, Austin Seipp wrote: > (JFYI: I hate to announce my return with a giant novel of > negative-nancy-ness about a proposal that just came up. I'm sorry > about this!) > > TL;DR: I'm strongly -1 on this, because I think it introduces a lot > of associated costs for everyone, the benefits aren't really clear, > and I think it obscures the real core issue about "how do we get > more contributors" and how to make that happen. Needless to say, > GitHub does not magically solve both of these AFAICS. Let me start off by saying I'm not arguing for GitHub or anything else to replace Phabricator. I'm merely trying to understand the problems with merge commits and patch sets. > - We can get rid of the awkwardness of squashes etc by using > Phabricator's "immutable" history, although it introduces merge > commits. Whether this is acceptable is up to debate (I dislike merge > commits, but could live with it). I'm genuinely curious about the need to avoid merge commits. I do avoid merge-master-to-topic-branch commits in submitted diffs, but unless you always only merge a single cumulative commit for each diff, merge commits are very useful for vcs history. > - As for some of the quotes, some of them are funny, but the real > message lies in the context. :) In particular, there have been > several cases (such as the DWARF work) where the idea was "write 30 > commits and put them on Phabricator". News flash: *this is bad*, no > matter whether you're using Phabricator or not, because it makes > reviewing the whole thing immensely difficult from a reviewer > perspective. The point here is that we can clear this up by being > more communicative about what we expect of authors of large patches, > and communicating your intent ASAP so we can get patches in as fast > as possible. Writing a patch is the easiest part of the work. I would also like to understand why reviewing a single commit is easier than the steps (commits) that led to the whole diff. Maybe I review stuff differently, but, as I wrote yesterday, I've always found it easier to follow the changes when it's split into proper commits. And instead of "big patch" I should have written "non-trivial patch". A 100-line unified diff can be equally hard to follow as a 1000-line diff, unless each diff hunk is accompanied with code comments. But comments don't always make sense in the code, and often enough it's best to keep it in the commit message only. Hence the need for splitting the work, and ideally committing as you work on it, with a final cleanup of rearranging commits into a proper set of commits. I'm repeating myself, but git-bisect is much more precise with relevant changes split up as they happened. From rpglover64 at gmail.com Thu Sep 3 13:59:55 2015 From: rpglover64 at gmail.com (Alex Rozenshteyn) Date: Thu, 3 Sep 2015 09:59:55 -0400 Subject: more releases In-Reply-To: <87si6wkdta.fsf@smart-cactus.org> References: <3E39E8B5-89C2-40F6-9180-C6D73AF3926F@cis.upenn.edu> <87si6y1v30.fsf@gmail.com> <87oahlksnm.fsf@smart-cactus.org> <87si6wkdta.fsf@smart-cactus.org> Message-ID: I have the impression (no data to back it up, though) that no small number of users download bindists (because most OS packages are out of date: Debian Unstable is still on 7.8.4, as is Ubuntu Wily; Arch is on 7.10.1). On Wed, Sep 2, 2015 at 12:04 PM, Ben Gamari wrote: > Richard Eisenberg writes: > > > I think some of my idea was misunderstood here: my goal was to have > > quick releases only from the stable branch. The goal would not be to > > release the new and shiny, but instead to get bugfixes out to users > > quicker. The new and shiny (master) would remain as it is now. In > > other words: more users would be affected by this change than just the > > vanguard. > > > I see. This is something we could certainly do. > > It would require, however, that we be more pro-active about > continuing to merge things to the stable branch after the release. > Currently the stable branch is essentially in the same state that it was > in for the 7.10.2 release. I've left it this way as it takes time and > care to cherry-pick patches to stable. Thusfar my poilcy has been to > perform this work lazily until it's clear that we will do > another stable release as otherwise the effort may well be wasted. > > So, even if the steps of building, testing, and uploading the release > are streamlined more frequent releases are still far from free. Whether > it's a worthwhile cost I don't know. > > This is a difficult question to answer without knowing more about how > typical users actually acquire GHC. For instance, this effort would > have minimal impact on users who get their compiler through their > distribution's package manager. On the other hand, if most users > download GHC bindists directly from the GHC download page, then perhaps > this would be effort well-spent. > > Cheers, > > - Ben > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From simonpj at microsoft.com Thu Sep 3 16:08:23 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Thu, 3 Sep 2015 16:08:23 +0000 Subject: D1182: Implement improved error messages for ambiguous type variables (#10733) In-Reply-To: <20150903061043.11268.51958@phabricator.haskell.org> References: <20150903061043.11268.51958@phabricator.haskell.org> Message-ID: <6bac15f299b2494187fdc47167cae02d@DB4PR30MB030.064d.mgd.msft.net> Edward | Jan's injective type families commit is causing tcfail220 to fail, but | that's unrelated to this ticket. This is true. I told Jan to commit anyway because tcfail220 is a "hsig" test, and a) I know that hsigs are in flux (although I am not clear about how) b) I don't understand them enough to fix. So I hope it's ok to have broken this. Jan and I can certainly help when you want to fix it. Meanwhile would you mark it as expect-broken. (Although I am not sure that it's worth opening a fresh ticket for it.) thanks Simon | -----Original Message----- | From: noreply at phabricator.haskell.org | [mailto:noreply at phabricator.haskell.org] | Sent: 03 September 2015 07:11 | To: Simon Peyton Jones | Subject: [Differential] [Commented On] D1182: Implement improved error | messages for ambiguous type variables (#10733) | | KaneTW added a comment. | | Jan's injective type families commit is causing tcfail220 to fail, but | that's unrelated to this ticket. | | | REPOSITORY | rGHC Glasgow Haskell Compiler | | REVISION DETAIL | https://phabricator.haskell.org/D1182 | | EMAIL PREFERENCES | https://phabricator.haskell.org/settings/panel/emailpreferences/ | | To: KaneTW, simonpj, bgamari, austin | Cc: goldfire, simonpj, thomie From ezyang at mit.edu Thu Sep 3 16:13:29 2015 From: ezyang at mit.edu (Edward Z. Yang) Date: Thu, 03 Sep 2015 09:13:29 -0700 Subject: Fwd: RE: D1182: Implement improved error messages for ambiguous type variables (#10733) Message-ID: <1441296802-sup-4146@sabre> It's certainly true that hsig is in flux, but it doesn't seem like injective type families should have broken this test. I'll take a look. Edward Excerpts from Simon Peyton Jones's message of 2015-09-03 09:08:23 -0700: > Edward > > | Jan's injective type families commit is causing tcfail220 to fail, but > | that's unrelated to this ticket. > > This is true. I told Jan to commit anyway because tcfail220 is a "hsig" test, and > a) I know that hsigs are in flux (although I am not clear about how) > b) I don't understand them enough to fix. > > So I hope it's ok to have broken this. Jan and I can certainly help when you want to fix it. > > Meanwhile would you mark it as expect-broken. (Although I am not sure that it's worth opening a fresh ticket for it.) > > thanks > > Simon > > | -----Original Message----- > | From: noreply at phabricator.haskell.org > | [mailto:noreply at phabricator.haskell.org] > | Sent: 03 September 2015 07:11 > | To: Simon Peyton Jones > | Subject: [Differential] [Commented On] D1182: Implement improved error > | messages for ambiguous type variables (#10733) > | > | KaneTW added a comment. > | > | Jan's injective type families commit is causing tcfail220 to fail, but > | that's unrelated to this ticket. > | > | > | REPOSITORY > | rGHC Glasgow Haskell Compiler > | > | REVISION DETAIL > | https://phabricator.haskell.org/D1182 > | > | EMAIL PREFERENCES > | https://phabricator.haskell.org/settings/panel/emailpreferences/ > | > | To: KaneTW, simonpj, bgamari, austin > | Cc: goldfire, simonpj, thomie --- End forwarded message --- From thomasmiedema at gmail.com Thu Sep 3 16:17:35 2015 From: thomasmiedema at gmail.com (Thomas Miedema) Date: Thu, 3 Sep 2015 18:17:35 +0200 Subject: D1182: Implement improved error messages for ambiguous type variables (#10733) In-Reply-To: <1441296802-sup-4146@sabre> References: <1441296802-sup-4146@sabre> Message-ID: The bug is trigger by Maybe now being a wired-in type. See https://phabricator.haskell.org/D1208 for a workaround. On Thu, Sep 3, 2015 at 6:13 PM, Edward Z. Yang wrote: > It's certainly true that hsig is in flux, but it doesn't seem like > injective type families should have broken this test. I'll take a look. > > Edward > > Excerpts from Simon Peyton Jones's message of 2015-09-03 09:08:23 -0700: > > Edward > > > > | Jan's injective type families commit is causing tcfail220 to fail, but > > | that's unrelated to this ticket. > > > > This is true. I told Jan to commit anyway because tcfail220 is a "hsig" > test, and > > a) I know that hsigs are in flux (although I am not clear about how) > > b) I don't understand them enough to fix. > > > > So I hope it's ok to have broken this. Jan and I can certainly help > when you want to fix it. > > > > Meanwhile would you mark it as expect-broken. (Although I am not sure > that it's worth opening a fresh ticket for it.) > > > > thanks > > > > Simon > > > > | -----Original Message----- > > | From: noreply at phabricator.haskell.org > > | [mailto:noreply at phabricator.haskell.org] > > | Sent: 03 September 2015 07:11 > > | To: Simon Peyton Jones > > | Subject: [Differential] [Commented On] D1182: Implement improved error > > | messages for ambiguous type variables (#10733) > > | > > | KaneTW added a comment. > > | > > | Jan's injective type families commit is causing tcfail220 to fail, but > > | that's unrelated to this ticket. > > | > > | > > | REPOSITORY > > | rGHC Glasgow Haskell Compiler > > | > > | REVISION DETAIL > > | https://phabricator.haskell.org/D1182 > > | > > | EMAIL PREFERENCES > > | https://phabricator.haskell.org/settings/panel/emailpreferences/ > > | > > | To: KaneTW, simonpj, bgamari, austin > > | Cc: goldfire, simonpj, thomie > --- End forwarded message --- > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hvriedel at gmail.com Thu Sep 3 16:41:02 2015 From: hvriedel at gmail.com (Herbert Valerio Riedel) Date: Thu, 03 Sep 2015 18:41:02 +0200 Subject: Shared data type for extension flags In-Reply-To: (Matthew Pickering's message of "Wed, 2 Sep 2015 10:00:40 +0200") References: Message-ID: <8737yvbgm9.fsf@gmail.com> On 2015-09-02 at 10:00:40 +0200, Matthew Pickering wrote: > Surely the easiest way here (including for other tooling - ie > haskell-src-exts) is to create a package which just provides this > enumeration. GHC, cabal, th, haskell-src-exts and so on then all > depend on this package rather than creating their own enumeration. I'm not sure this is such a good idea having a package many packages depend on if `ghc` is one of them, as this forces every install-plan which ends up involving the ghc package to be pinned to the very same version the `ghc` package was compiled against. This is a general problem affecting packages `ghc` depends upon (and as a side-note starting with GHC 7.10, we were finally able to cut the package-dependency between `ghc` and `Cabal`) Also, Cabal is not GHC specific, and contains a list of known extensions (`KnownExtension`) across multiple Haskell compilers https://github.com/haskell/cabal/blob/master/Cabal/Language/Haskell/Extension.hs and I assume the extension enumeration needed for GHC would be tailored to GHC's need and omit extensions not relevant to GHC, as well as include experimental/internal ones not suited for consumption by Cabal. From rwbarton at gmail.com Thu Sep 3 16:51:03 2015 From: rwbarton at gmail.com (Reid Barton) Date: Thu, 3 Sep 2015 12:51:03 -0400 Subject: Shared data type for extension flags In-Reply-To: <8737yvbgm9.fsf@gmail.com> References: <8737yvbgm9.fsf@gmail.com> Message-ID: On Thu, Sep 3, 2015 at 12:41 PM, Herbert Valerio Riedel wrote: > On 2015-09-02 at 10:00:40 +0200, Matthew Pickering wrote: > > Surely the easiest way here (including for other tooling - ie > > haskell-src-exts) is to create a package which just provides this > > enumeration. GHC, cabal, th, haskell-src-exts and so on then all > > depend on this package rather than creating their own enumeration. > > I'm not sure this is such a good idea having a package many packages > depend on if `ghc` is one of them, as this forces every install-plan > which ends up involving the ghc package to be pinned to the very same > version the `ghc` package was compiled against. > > This is a general problem affecting packages `ghc` depends upon (and as > a side-note starting with GHC 7.10, we were finally able to cut the > package-dependency between `ghc` and `Cabal`) > Surely this argument does not apply to a package created to hold data types that would otherwise live in the template-haskell or ghc packages. Regards, Reid Barton -------------- next part -------------- An HTML attachment was scrubbed... URL: From ezyang at mit.edu Thu Sep 3 17:51:43 2015 From: ezyang at mit.edu (Edward Z Yang) Date: Thu, 3 Sep 2015 17:51:43 +0000 Subject: Using GHC API to compile Haskell file In-Reply-To: References: <1440368677-sup-472@sabre>, Message-ID: Hello Neil, Sorry about the delay; I hadn't gotten around to seeing if I could reproduce it. Here is a working copy of the program which appears to work with GHC 7.10.2 on 64-bit Windows: module Main where import GHC import GHC.Paths ( libdir ) import DynFlags import SysTools main = do defaultErrorHandler defaultFatalMessager defaultFlushOut $ do runGhc (Just libdir) $ do dflags <- getSessionDynFlags setSessionDynFlags (gopt_set dflags Opt_Static) target <- guessTarget "Test.hs" Nothing setTargets [target] load LoadAllTargets Here is how I tested it: stack ghc -- -package ghc -package ghc-paths --make Main.hs (after stack installing ghc-paths) Did you mean the error occurred when you did set Opt_Static? I can?t reproduce your specific error in that case either. Cheers, Edward Sent from Windows Mail From: Neil Mitchell Sent: ?Monday?, ?August? ?24?, ?2015 ?12?:?42? ?AM To: Edward Z Yang Cc: ghc-devs at haskell.org Thanks Edward, that fixed the issue with GHC 7.8.3. While trying to replicate with 7.10.2 to submit a bug report, I got a different error, even with your fix included: C:\Users\NDMIT_~1\AppData\Local\Temp\ghc2428_1\ghc_4.o:ghc_3.c:(.text+0x55): undefined reference to `ZCMain_main_closure' Doing another diff of the command lines, I see ghc --make includes "Test.o" on the Link line, but the API doesn't. Thanks, Neil On Mon, Aug 24, 2015 at 12:00 AM, Edward Z. Yang wrote: > The problem is that the default code is trying to build a dynamically > linked executable, but the Windows distributions don't come with dlls > by default. > > Why doesn't the GHC API code pick this up? Based on snooping > ghc/Main.hs, it's probably because you need to call parseDynamicFlags* > which will call updateWays which will turn off -dynamic-too if the > platform doesn't support it. > > GHC bug? Absolutely! Please file a ticket. > > Edward > > Excerpts from Neil Mitchell's message of 2015-08-23 05:43:28 -0700: >> Hi, >> >> Is this the right place for GHC API queries? If not, is there anywhere better? >> >> I want to compile a Haskell module, much like `ghc --make` or `ghc -c` >> does. The sample code on the Haskell wiki >> (https://wiki.haskell.org/GHC/As_a_library#A_Simple_Example), >> StackOverflow (http://stackoverflow.com/a/5631338/160673) and in GHC >> API slides (http://sneezy.cs.nott.ac.uk/fplunch/weblog/wp-content/uploads/2008/12/ghc-api-slidesnotes.pdf) >> says: >> >> import GHC >> import GHC.Paths ( libdir ) >> import DynFlags >> >> main = >> defaultErrorHandler defaultFatalMessager defaultFlushOut $ do >> runGhc (Just libdir) $ do >> dflags <- getSessionDynFlags >> setSessionDynFlags dflags >> target <- guessTarget "Test.hs" Nothing >> setTargets [target] >> load LoadAllTargets >> >> However, given a `Test.hs` file with the contents `main = print 1`, I >> get the error: >> >> C:/Program Files (x86)/MinGHC-7.8.3/ghc-7.8.3/mingw/bin/ld.exe: >> cannot find -lHSbase-4.7.0.1-ghc7.8.3 >> C:/Program Files (x86)/MinGHC-7.8.3/ghc-7.8.3/mingw/bin/ld.exe: >> cannot find -lHSinteger-gmp-0.5.1.0-ghc7.8.3 >> C:/Program Files (x86)/MinGHC-7.8.3/ghc-7.8.3/mingw/bin/ld.exe: >> cannot find -lHSghc-prim-0.3.1.0-ghc7.8.3 >> C:/Program Files (x86)/MinGHC-7.8.3/ghc-7.8.3/mingw/bin/ld.exe: >> cannot find -lHSrts-ghc7.8.3 >> C:/Program Files (x86)/MinGHC-7.8.3/ghc-7.8.3/mingw/bin/ld.exe: >> cannot find -lffi-6 >> collect2: ld returned 1 exit status >> >> Has the recipe changed? >> >> By turning up the verbosity, I was able to compare the command line >> passed to the linker. The failing GHC API call contains: >> >> "-lHSbase-4.7.0.1-ghc7.8.3" "-lHSinteger-gmp-0.5.1.0-ghc7.8.3" >> "-lHSghc-prim-0.3.1.0-ghc7.8.3" "-lHSrts-ghc7.8.3" "-lffi-6" >> >> While the succeeding ghc --make contains: >> >> "-lHSbase-4.7.0.1" "-lHSinteger-gmp-0.5.1.0" >> "-lHSghc-prim-0.3.1.0" "-lHSrts" "-lCffi-6" >> >> Should I be getting DynFlags differently to influence those link variables? >> >> Thanks, Neil -------------- next part -------------- An HTML attachment was scrubbed... URL: From marlowsd at gmail.com Thu Sep 3 19:02:58 2015 From: marlowsd at gmail.com (Simon Marlow) Date: Thu, 3 Sep 2015 12:02:58 -0700 Subject: D1182: Implement improved error messages for ambiguous type variables (#10733) In-Reply-To: <6bac15f299b2494187fdc47167cae02d@DB4PR30MB030.064d.mgd.msft.net> References: <20150903061043.11268.51958@phabricator.haskell.org> <6bac15f299b2494187fdc47167cae02d@DB4PR30MB030.064d.mgd.msft.net> Message-ID: <55E89962.4020304@gmail.com> On 03/09/2015 09:08, Simon Peyton Jones wrote: > Edward > > | Jan's injective type families commit is causing tcfail220 to fail, but > | that's unrelated to this ticket. > > This is true. I told Jan to commit anyway because tcfail220 is a "hsig" test, and > a) I know that hsigs are in flux (although I am not clear about how) > b) I don't understand them enough to fix. > > So I hope it's ok to have broken this. Jan and I can certainly help when you want to fix it. In general we shouldn't commit anything that breaks validate, because this causes problems for other developers. The right thing to do would be to mark it expect_broken before committing. Cheers Simon > > Meanwhile would you mark it as expect-broken. (Although I am not sure that it's worth opening a fresh ticket for it.) > > thanks > > Simon > > > | -----Original Message----- > | From: noreply at phabricator.haskell.org > | [mailto:noreply at phabricator.haskell.org] > | Sent: 03 September 2015 07:11 > | To: Simon Peyton Jones > | Subject: [Differential] [Commented On] D1182: Implement improved error > | messages for ambiguous type variables (#10733) > | > | KaneTW added a comment. > | > | Jan's injective type families commit is causing tcfail220 to fail, but > | that's unrelated to this ticket. > | > | > | REPOSITORY > | rGHC Glasgow Haskell Compiler > | > | REVISION DETAIL > | https://phabricator.haskell.org/D1182 > | > | EMAIL PREFERENCES > | https://phabricator.haskell.org/settings/panel/emailpreferences/ > | > | To: KaneTW, simonpj, bgamari, austin > | Cc: goldfire, simonpj, thomie > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > From jan.stolarek at p.lodz.pl Thu Sep 3 19:57:38 2015 From: jan.stolarek at p.lodz.pl (Jan Stolarek) Date: Thu, 3 Sep 2015 21:57:38 +0200 Subject: D1182: Implement improved error messages for ambiguous type variables (#10733) In-Reply-To: <55E89962.4020304@gmail.com> References: <6bac15f299b2494187fdc47167cae02d@DB4PR30MB030.064d.mgd.msft.net> <55E89962.4020304@gmail.com> Message-ID: <201509032157.38426.jan.stolarek@p.lodz.pl> > In general we shouldn't commit anything that breaks validate, because > this causes problems for other developers. The right thing to do would > be to mark it expect_broken before committing. Sorry for that. I was actually thinking about marking the test as expect_broken, but then the problem would be completely hidden> I wanted to discuss a possible solution with Simon and Edward first but it looks like Thomas already found a workaround. Jan From ezyang at mit.edu Thu Sep 3 21:12:31 2015 From: ezyang at mit.edu (Edward Z. Yang) Date: Thu, 03 Sep 2015 14:12:31 -0700 Subject: D1182: Implement improved error messages for ambiguous type variables (#10733) In-Reply-To: References: <1441296802-sup-4146@sabre> Message-ID: <1441314734-sup-6168@sabre> Thanks Thomas, I think this workaround is fine. Excerpts from Thomas Miedema's message of 2015-09-03 09:17:35 -0700: > The bug is trigger by Maybe now being a wired-in type. See > https://phabricator.haskell.org/D1208 for a workaround. > > On Thu, Sep 3, 2015 at 6:13 PM, Edward Z. Yang wrote: > > > It's certainly true that hsig is in flux, but it doesn't seem like > > injective type families should have broken this test. I'll take a look. > > > > Edward > > > > Excerpts from Simon Peyton Jones's message of 2015-09-03 09:08:23 -0700: > > > Edward > > > > > > | Jan's injective type families commit is causing tcfail220 to fail, but > > > | that's unrelated to this ticket. > > > > > > This is true. I told Jan to commit anyway because tcfail220 is a "hsig" > > test, and > > > a) I know that hsigs are in flux (although I am not clear about how) > > > b) I don't understand them enough to fix. > > > > > > So I hope it's ok to have broken this. Jan and I can certainly help > > when you want to fix it. > > > > > > Meanwhile would you mark it as expect-broken. (Although I am not sure > > that it's worth opening a fresh ticket for it.) > > > > > > thanks > > > > > > Simon > > > > > > | -----Original Message----- > > > | From: noreply at phabricator.haskell.org > > > | [mailto:noreply at phabricator.haskell.org] > > > | Sent: 03 September 2015 07:11 > > > | To: Simon Peyton Jones > > > | Subject: [Differential] [Commented On] D1182: Implement improved error > > > | messages for ambiguous type variables (#10733) > > > | > > > | KaneTW added a comment. > > > | > > > | Jan's injective type families commit is causing tcfail220 to fail, but > > > | that's unrelated to this ticket. > > > | > > > | > > > | REPOSITORY > > > | rGHC Glasgow Haskell Compiler > > > | > > > | REVISION DETAIL > > > | https://phabricator.haskell.org/D1182 > > > | > > > | EMAIL PREFERENCES > > > | https://phabricator.haskell.org/settings/panel/emailpreferences/ > > > | > > > | To: KaneTW, simonpj, bgamari, austin > > > | Cc: goldfire, simonpj, thomie > > --- End forwarded message --- > > _______________________________________________ > > ghc-devs mailing list > > ghc-devs at haskell.org > > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > > From marlowsd at gmail.com Thu Sep 3 22:12:07 2015 From: marlowsd at gmail.com (Simon Marlow) Date: Thu, 3 Sep 2015 15:12:07 -0700 Subject: Foreign calls and periodic alarm signals In-Reply-To: References: Message-ID: <55E8C5B7.9030309@gmail.com> On 02/09/2015 15:42, Phil Ruffwind wrote: > TL;DR: Does 'foreign import safe' silence the periodic alarm signals? No it doesn't. Perhaps the fact that a safe FFI call may create another worker thread means that the timer signal has gone to the other thread and didn't interrupt the thread making the statfs64() call. There's pthread_setmask() that could help, but it's pretty difficult to do this in a consistent way because we'd have to pthread_setmask() every thread that runs Haskell code, including calls from outside. I'm not sure yet what the right solution is, but a good start would be to open a ticket. Cheers Simon > I received a report on this rather strange bug in 'directory': > > https://github.com/haskell/directory/issues/35#issuecomment-136890912 > > I've concluded based on the dtruss log that it's caused by the timer > signal that the GHC runtime emits. Somewhere inside the guts of > 'realpath' on Mac OS X, there is a function that does the moral > equivalent of: > > while (statfs64(?) && errno == EINTR); > > On a slow filesystem like SSHFS, this can cause a permanent hang from > the barrage of signals. > > The reporter found that using 'foreign import safe' mitigates the > issue. What I'm curious mainly is that: is something that the GHC > runtime guarantees -- is using 'foreign import safe' assured to turn > off the periodic signals for that thread? > > I tried reading this article [1], which seems to be the only > documentation I could find about this, and it didn't really go into > much depth about them. (I also couldn't find any info about how > frequently they occur, on which threads they occur, or which specific > signal it uses.) > > I'm also concerned whether there are other foreign functions out in > the wild that could suffer the same bug, but remain hidden because > they normally complete before the next alarm signal. > > [1]: https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Signals > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > From _deepfire at feelingofgreen.ru Thu Sep 3 22:31:53 2015 From: _deepfire at feelingofgreen.ru (Kosyrev Serge) Date: Fri, 04 Sep 2015 01:31:53 +0300 Subject: UNS: Re: Proposal: accept pull requests on GitHub In-Reply-To: (sfid-20150903_113912_535316_3EC6A2E7) (Joe Hillenbrand's message of "Thu, 3 Sep 2015 00:18:03 -0700") References: <55E7453A.90309@gmail.com> <87mvx4mu2x.fsf@andromedae.feelingofgreen.ru> Message-ID: <877fo7w2w6.fsf@andromedae.feelingofgreen.ru> Joe Hillenbrand writes: >> As a wild idea -- did anyone look at /Gitlab/ instead? > > My personal experience with Gitlab at a previous job is that it is > extremely unstable. I'd say even more unstable than trac and > phabricator. It's especially bad when dealing with long files. Curiously, for the nearly three years that we've been dealing with it, I couldn't have pointed at a single instability (or even just a bug), despite using a moderately loaded instance of Gitlab. Also, not being a huge enterprise yet, Gitlab folks /might/ potentially be more responsive to feature requests from a prominent open-source project.. -- ? ???????e? / respectfully, ??????? ?????? From thomasmiedema at gmail.com Fri Sep 4 00:15:44 2015 From: thomasmiedema at gmail.com (Thomas Miedema) Date: Fri, 4 Sep 2015 02:15:44 +0200 Subject: [Diffusion] [Committed] rGHCbe0ce8718ea4: Fix for crash in setnumcapabilities001 Message-ID: Simon, for what it's worth, I sporadically (< once per month) see this test timing out on Phabricator. Latest occurence: https://phabricator.haskell.org/harbormaster/build/5904/?l=0 Thomas On Fri, Jun 26, 2015 at 10:32 AM, simonmar (Simon Marlow) < noreply at phabricator.haskell.org> wrote: > simonmar committed rGHCbe0ce8718ea4: Fix for crash in > setnumcapabilities001 (authored by simonmar). > > Fix for crash in setnumcapabilities001 > > getNewNursery() was unconditionally incrementing next_nursery, which > is normally fine but it broke an assumption in > storageAddCapabilities(). This manifested as an occasional crash in > the setnumcapabilities001 test. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam at well-typed.com Fri Sep 4 05:50:12 2015 From: adam at well-typed.com (Adam Gundry) Date: Fri, 04 Sep 2015 06:50:12 +0100 Subject: A process for reporting security-sensitive issues In-Reply-To: References: Message-ID: <55E93114.6050800@well-typed.com> On 03/09/15 08:22, Michael Smith wrote: > I feel there should be some process for reporting security-sensitive issues > in GHC -- for example, #9562 and #10826 in Trac. Perhaps something like the > SensitiveTicketsPlugin [3] could be used? > > [1] https://ghc.haskell.org/trac/ghc/ticket/9562 > [2] https://ghc.haskell.org/trac/ghc/ticket/10826 > [3] https://trac-hacks.org/wiki/SensitiveTicketsPlugin Thanks for raising this. While I see where you are coming from, I'm going to argue against it, because I think it creates a false impression of the security guarantees GHC provides. Such a process may give the impression that there are people directly tasked with handling such security bugs, which is not currently the case. I think it is unreasonable for the security of a system to depend on GHC having no type soundness bugs, particularly since GHC is actively used for developing experimental type system features. #9562 has been open for a year and we don't have a good solution. Relatedly, I think the Safe Haskell documentation should prominently warn about the existence of #9562 and the possibility of other type soundness bugs, like it does for compilation safety issues. What do others think? Adam -- Adam Gundry, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/ From spam at scientician.net Fri Sep 4 05:55:52 2015 From: spam at scientician.net (Bardur Arantsson) Date: Fri, 4 Sep 2015 07:55:52 +0200 Subject: Proposal: accept pull requests on GitHub In-Reply-To: References: <55E7453A.90309@gmail.com> <87mvx4mu2x.fsf@andromedae.feelingofgreen.ru> Message-ID: On 09/03/2015 09:18 AM, Joe Hillenbrand wrote: >> As a wild idea -- did anyone look at /Gitlab/ instead? > > My personal experience with Gitlab at a previous job is that it is > extremely unstable. I'd say even more unstable than trac and > phabricator. It's especially bad when dealing with long files. > If we're talking alternative systems, then I can personally recommend Gerrit (https://www.gerritcodereview.com/) which, while it *looks* pretty basic, it works really well with the general Git workflow. For example, it tracks commits in individual reviews, but tracks dependencies between those commits. So when e.g. you push a new series of commits implementing a feature, all those reviews just get a new "version" and you can diff between different versions of each individual commit -- this often cuts down drastically on how much you have to re-review when a new version is submitted. You can also specify auto-merge when a review gets +2 (or +1, or whatever), including rebase-before-merge-and-ff instead of having merge commits which just clutter the history needlessly. You can set up various rules using a predicate-based rules engine, for example about a review needing two approvals and/or always needing approval from an (external) build system, etc. The only setup it needs in a git hook... which it will tell you exactly how to install with a single command when you push your first review. (It's some scp command, I seem to recall.) Caveat: I haven't tried using it on Windows. Regards, From rf at rufflewind.com Fri Sep 4 07:52:33 2015 From: rf at rufflewind.com (Phil Ruffwind) Date: Fri, 4 Sep 2015 03:52:33 -0400 Subject: Foreign calls and periodic alarm signals In-Reply-To: <55E8C5B7.9030309@gmail.com> References: <55E8C5B7.9030309@gmail.com> Message-ID: > a good start would be to open a ticket. Okay, done: https://ghc.haskell.org/trac/ghc/ticket/10840 From ezyang at mit.edu Fri Sep 4 08:03:45 2015 From: ezyang at mit.edu (Edward Z. Yang) Date: Fri, 04 Sep 2015 01:03:45 -0700 Subject: Unlifted data types Message-ID: <1441353701-sup-9422@sabre> Hello friends, After many discussions and beers at ICFP, I've written up my current best understanding of the unlifted data types proposal: https://ghc.haskell.org/trac/ghc/wiki/UnliftedDataTypes Many thanks to Richard, Iavor, Ryan, Simon, Duncan, George, Paul, Edward Kmett, and any others who I may have forgotten for crystallizing this proposal. Cheers, Edward From ndmitchell at gmail.com Fri Sep 4 12:40:19 2015 From: ndmitchell at gmail.com (Neil Mitchell) Date: Fri, 4 Sep 2015 13:40:19 +0100 Subject: Using GHC API to compile Haskell file In-Reply-To: References: <1440368677-sup-472@sabre> Message-ID: > Sorry about the delay; I hadn't gotten around to seeing if I could reproduce > it. Here is a working copy of the program which appears to work with GHC > 7.10.2 on 64-bit Windows: Thanks, that does indeed solve it the first bit. To try and make it a bit clearer what I'm after, I've put the stuff in a git repo: https://github.com/ndmitchell/ghc-process/blob/master/Main.hs Looking at Main.hs, there are three modes, Process (run ghc.exe 3 times), APIMake (the code you sent me), and APISingle (attempt to replicate the 3 ghc.exe invokations through the GHC API). The first two work perfectly, following Edward's tweaks. The final one fails at linking. So I have two questions: 1) Is there any way to do the two compilations sharing some cached state, e.g. loaded packages/.hi files, so each compilation goes faster. 2) Is there any way to do the link alone through the GHC API. Thanks, Neil From eric at seidel.io Fri Sep 4 15:29:59 2015 From: eric at seidel.io (Eric Seidel) Date: Fri, 04 Sep 2015 08:29:59 -0700 Subject: Unlifted data types In-Reply-To: <1441353701-sup-9422@sabre> References: <1441353701-sup-9422@sabre> Message-ID: <1441380599.3893947.374883985.0FBB1F3A@webmail.messagingengine.com> You mention NFData in the motivation but then say that !Maybe !Int is not allowed. This leads me to wonder what the semantics of foo :: !Maybe Int -> !Maybe Int foo x = x bar = foo (Just undefined) are. Based on the FAQ it sounds like foo would *not* force the undefined, is that correct? Also, there's a clear connection between these UnliftedTypes and BangPatterns, but as I understand it the ! is essentially a new type constructor. So while foo1 :: !Int -> !Int foo1 x = x and foo2 :: Int -> Int foo2 !x = x have the same runtime behavior, they have different types, so you can't pass a regular Int to foo1. Is that desirable? Eric On Fri, Sep 4, 2015, at 01:03, Edward Z. Yang wrote: > Hello friends, > > After many discussions and beers at ICFP, I've written up my current > best understanding of the unlifted data types proposal: > > https://ghc.haskell.org/trac/ghc/wiki/UnliftedDataTypes > > Many thanks to Richard, Iavor, Ryan, Simon, Duncan, George, Paul, > Edward Kmett, and any others who I may have forgotten for crystallizing > this proposal. > > Cheers, > Edward > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From ezyang at mit.edu Fri Sep 4 15:43:48 2015 From: ezyang at mit.edu (Edward Z. Yang) Date: Fri, 04 Sep 2015 08:43:48 -0700 Subject: Unlifted data types In-Reply-To: <1441380599.3893947.374883985.0FBB1F3A@webmail.messagingengine.com> References: <1441353701-sup-9422@sabre> <1441380599.3893947.374883985.0FBB1F3A@webmail.messagingengine.com> Message-ID: <1441381088-sup-172@sabre> Excerpts from Eric Seidel's message of 2015-09-04 08:29:59 -0700: > You mention NFData in the motivation but then say that !Maybe !Int is > not allowed. This leads me to wonder what the semantics of > > foo :: !Maybe Int -> !Maybe Int > foo x = x > > bar = foo (Just undefined) > > are. Based on the FAQ it sounds like foo would *not* force the > undefined, is that correct? Yes. So maybe NFData is a *bad* example! > Also, there's a clear connection between these UnliftedTypes and > BangPatterns, but as I understand it the ! is essentially a new type > constructor. So while > > foo1 :: !Int -> !Int > foo1 x = x > > and > > foo2 :: Int -> Int > foo2 !x = x > > have the same runtime behavior, they have different types, so you can't > pass a regular Int to foo1. Is that desirable? Yes. Actually, you have a good point that we'd like to have functions 'force :: Int -> !Int' and 'suspend :: !Int -> Int'. Unfortunately, we can't generate 'Coercible' instances for these types unless Coercible becomes polykinded. Perhaps we can make a new type class, or just magic polymorphic functions. Edward From ezyang at mit.edu Fri Sep 4 15:45:37 2015 From: ezyang at mit.edu (Edward Z. Yang) Date: Fri, 04 Sep 2015 08:45:37 -0700 Subject: Unlifted data types In-Reply-To: <1441381088-sup-172@sabre> References: <1441353701-sup-9422@sabre> <1441380599.3893947.374883985.0FBB1F3A@webmail.messagingengine.com> <1441381088-sup-172@sabre> Message-ID: <1441381504-sup-5051@sabre> Excerpts from Edward Z. Yang's message of 2015-09-04 08:43:48 -0700: > Yes. Actually, you have a good point that we'd like to have functions > 'force :: Int -> !Int' and 'suspend :: !Int -> Int'. Unfortunately, we > can't generate 'Coercible' instances for these types unless Coercible becomes > polykinded. Perhaps we can make a new type class, or just magic > polymorphic functions. Michael Greenberg points out on Twitter that suspend must be a special form, just like lambda abstraction. Edward From eric at seidel.io Fri Sep 4 16:06:15 2015 From: eric at seidel.io (Eric Seidel) Date: Fri, 04 Sep 2015 09:06:15 -0700 Subject: Unlifted data types In-Reply-To: <1441381088-sup-172@sabre> References: <1441353701-sup-9422@sabre> <1441380599.3893947.374883985.0FBB1F3A@webmail.messagingengine.com> <1441381088-sup-172@sabre> Message-ID: <1441382775.352880.374932065.34A2C130@webmail.messagingengine.com> Another good example would be foo :: ![Int] -> ![Int] Does this force just the first constructor or the whole spine? My guess would be the latter. On Fri, Sep 4, 2015, at 08:43, Edward Z. Yang wrote: > Excerpts from Eric Seidel's message of 2015-09-04 08:29:59 -0700: > > You mention NFData in the motivation but then say that !Maybe !Int is > > not allowed. This leads me to wonder what the semantics of > > > > foo :: !Maybe Int -> !Maybe Int > > foo x = x > > > > bar = foo (Just undefined) > > > > are. Based on the FAQ it sounds like foo would *not* force the > > undefined, is that correct? > > Yes. So maybe NFData is a *bad* example! > > > Also, there's a clear connection between these UnliftedTypes and > > BangPatterns, but as I understand it the ! is essentially a new type > > constructor. So while > > > > foo1 :: !Int -> !Int > > foo1 x = x > > > > and > > > > foo2 :: Int -> Int > > foo2 !x = x > > > > have the same runtime behavior, they have different types, so you can't > > pass a regular Int to foo1. Is that desirable? > > Yes. Actually, you have a good point that we'd like to have functions > 'force :: Int -> !Int' and 'suspend :: !Int -> Int'. Unfortunately, we > can't generate 'Coercible' instances for these types unless Coercible > becomes > polykinded. Perhaps we can make a new type class, or just magic > polymorphic functions. > > Edward From dan.doel at gmail.com Fri Sep 4 16:57:42 2015 From: dan.doel at gmail.com (Dan Doel) Date: Fri, 4 Sep 2015 12:57:42 -0400 Subject: Unlifted data types In-Reply-To: <1441353701-sup-9422@sabre> References: <1441353701-sup-9422@sabre> Message-ID: All your examples are non-recursive types. So, if I have: data Nat = Zero | Suc Nat what is !Nat? Does it just have the outer-most part unlifted? Is the intention to make the !a in data type declarations first-class, so that when we say: data Nat = Zero | Suc !Nat the !Nat part is now an entity in itself, and it is, for this declaration, the set of naturals, whereas Nat is the flat domain? On Fri, Sep 4, 2015 at 4:03 AM, Edward Z. Yang wrote: > Hello friends, > > After many discussions and beers at ICFP, I've written up my current > best understanding of the unlifted data types proposal: > > https://ghc.haskell.org/trac/ghc/wiki/UnliftedDataTypes > > Many thanks to Richard, Iavor, Ryan, Simon, Duncan, George, Paul, > Edward Kmett, and any others who I may have forgotten for crystallizing > this proposal. > > Cheers, > Edward > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From ezyang at mit.edu Fri Sep 4 18:12:39 2015 From: ezyang at mit.edu (Edward Z. Yang) Date: Fri, 04 Sep 2015 11:12:39 -0700 Subject: Unlifted data types In-Reply-To: References: <1441353701-sup-9422@sabre> Message-ID: <1441390306-sup-6240@sabre> Excerpts from Dan Doel's message of 2015-09-04 09:57:42 -0700: > All your examples are non-recursive types. So, if I have: > > data Nat = Zero | Suc Nat > > what is !Nat? Does it just have the outer-most part unlifted? Just the outermost part. > Is the intention to make the !a in data type declarations first-class, > so that when we say: > > data Nat = Zero | Suc !Nat > > the !Nat part is now an entity in itself, and it is, for this > declaration, the set of naturals, whereas Nat is the flat domain? No, in fact, there is a semantic difference between this and strict fields (which Paul pointed out to me.) There's now an updated proposal on the Trac which partially solves this problem. Edward From ezyang at mit.edu Fri Sep 4 18:14:50 2015 From: ezyang at mit.edu (Edward Z. Yang) Date: Fri, 04 Sep 2015 11:14:50 -0700 Subject: Unlifted data types In-Reply-To: <1441382775.352880.374932065.34A2C130@webmail.messagingengine.com> References: <1441353701-sup-9422@sabre> <1441380599.3893947.374883985.0FBB1F3A@webmail.messagingengine.com> <1441381088-sup-172@sabre> <1441382775.352880.374932065.34A2C130@webmail.messagingengine.com> Message-ID: <1441390373-sup-5413@sabre> Hello Eric, You can't tell; the head not withstanding, `[a]` is still a lazy list, so you would need to look at the function body to see if any extra forcing goes on. `Force` does not induce `seq`ing: it is an obligation for the call-site. (Added it to the FAQ). Edward Excerpts from Eric Seidel's message of 2015-09-04 09:06:15 -0700: > Another good example would be > > foo :: ![Int] -> ![Int] > > Does this force just the first constructor or the whole spine? My guess > would be the latter. > > On Fri, Sep 4, 2015, at 08:43, Edward Z. Yang wrote: > > Excerpts from Eric Seidel's message of 2015-09-04 08:29:59 -0700: > > > You mention NFData in the motivation but then say that !Maybe !Int is > > > not allowed. This leads me to wonder what the semantics of > > > > > > foo :: !Maybe Int -> !Maybe Int > > > foo x = x > > > > > > bar = foo (Just undefined) > > > > > > are. Based on the FAQ it sounds like foo would *not* force the > > > undefined, is that correct? > > > > Yes. So maybe NFData is a *bad* example! > > > > > Also, there's a clear connection between these UnliftedTypes and > > > BangPatterns, but as I understand it the ! is essentially a new type > > > constructor. So while > > > > > > foo1 :: !Int -> !Int > > > foo1 x = x > > > > > > and > > > > > > foo2 :: Int -> Int > > > foo2 !x = x > > > > > > have the same runtime behavior, they have different types, so you can't > > > pass a regular Int to foo1. Is that desirable? > > > > Yes. Actually, you have a good point that we'd like to have functions > > 'force :: Int -> !Int' and 'suspend :: !Int -> Int'. Unfortunately, we > > can't generate 'Coercible' instances for these types unless Coercible > > becomes > > polykinded. Perhaps we can make a new type class, or just magic > > polymorphic functions. > > > > Edward From dan.doel at gmail.com Fri Sep 4 20:09:26 2015 From: dan.doel at gmail.com (Dan Doel) Date: Fri, 4 Sep 2015 16:09:26 -0400 Subject: Unlifted data types In-Reply-To: <1441390306-sup-6240@sabre> References: <1441353701-sup-9422@sabre> <1441390306-sup-6240@sabre> Message-ID: Okay. That answers another question I had, which was whether MutVar# and such would go in the new kind. So now we have partial, extended natural numbers: data PNat :: * where PZero :: PNat PSuc :: PNat -> PNat A flat domain of natural numbers: data FNat :: * where FZero :: FNat FSuc :: !FNat -> FNat And two sets of natural numbers: Force FNat :: Unlifted data UNat :: Unlifted where UZero :: UNat USuc :: UNat -> UNat And really perhaps two flat domains (and three sets), if you use Force instead of !, which would differ on who ensures the evaluation. That's kind of a lot of incompatible definitions of essentially the same thing (PNat being the significantly different thing). I was kind of more enthused about first class !a. For instance, if you think about the opening quote by Bob Harper, he's basically wrong. The flat domain FNat is the natural numbers (existing in an overall lazy language), and has the reasoning properties he wants to teach students about with very little complication. It'd be satisfying to recognize that unlifting the outer-most part gets you exactly there, with whatever performance characteristics that implies. Or to get rid of ! and use Unlifted definitions instead. Maybe backwards compatibility mandates the duplication, but it'd be nice if some synthesis could be reached. ---- It'd also be good to think about/specify how this is going to interact with unpacked/unboxed sums. On Fri, Sep 4, 2015 at 2:12 PM, Edward Z. Yang wrote: > Excerpts from Dan Doel's message of 2015-09-04 09:57:42 -0700: >> All your examples are non-recursive types. So, if I have: >> >> data Nat = Zero | Suc Nat >> >> what is !Nat? Does it just have the outer-most part unlifted? > > Just the outermost part. > >> Is the intention to make the !a in data type declarations first-class, >> so that when we say: >> >> data Nat = Zero | Suc !Nat >> >> the !Nat part is now an entity in itself, and it is, for this >> declaration, the set of naturals, whereas Nat is the flat domain? > > No, in fact, there is a semantic difference between this and strict > fields (which Paul pointed out to me.) There's now an updated proposal > on the Trac which partially solves this problem. > > Edward From ezyang at mit.edu Fri Sep 4 21:23:33 2015 From: ezyang at mit.edu (Edward Z. Yang) Date: Fri, 04 Sep 2015 14:23:33 -0700 Subject: Unlifted data types In-Reply-To: References: <1441353701-sup-9422@sabre> <1441390306-sup-6240@sabre> Message-ID: <1441400654-sup-1647@sabre> Excerpts from Dan Doel's message of 2015-09-04 13:09:26 -0700: > Okay. That answers another question I had, which was whether MutVar# > and such would go in the new kind. > > So now we have partial, extended natural numbers: > > data PNat :: * where > PZero :: PNat > PSuc :: PNat -> PNat > > A flat domain of natural numbers: > > data FNat :: * where > FZero :: FNat > FSuc :: !FNat -> FNat > > And two sets of natural numbers: > > Force FNat :: Unlifted > > data UNat :: Unlifted where > UZero :: UNat > USuc :: UNat -> UNat > > And really perhaps two flat domains (and three sets), if you use Force > instead of !, which would differ on who ensures the evaluation. That's > kind of a lot of incompatible definitions of essentially the same > thing (PNat being the significantly different thing). > > I was kind of more enthused about first class !a. For instance, if you > think about the opening quote by Bob Harper, he's basically wrong. The > flat domain FNat is the natural numbers (existing in an overall lazy > language), and has the reasoning properties he wants to teach students > about with very little complication. It'd be satisfying to recognize > that unlifting the outer-most part gets you exactly there, with > whatever performance characteristics that implies. Or to get rid of ! > and use Unlifted definitions instead. > > Maybe backwards compatibility mandates the duplication, but it'd be > nice if some synthesis could be reached. I would certainly agree that in terms of the data that is representable, there is not much difference; but there is a lot of difference for the client between Force and a strict field. If I write: let x = undefined y = Strict x in True No error occurs with: data Strict = Strict !a But an error occurs with: data Strict = Strict (Force a) One possibility for how to reconcile the difference for BC is to posit that there are just two different constructors: Strict :: a -> Strict a Strict! :: Force a -> Strict a But this kind of special handling is a bit bothersome. Consider: data SPair a b = SPair (!a, !b) The constructor has what type? Probably SPair :: (Force a, Force b) -> SPair a and not: SPair :: (a, b) -> SPair a > It'd also be good to think about/specify how this is going to interact > with unpacked/unboxed sums. I don't think it interacts any differently than with unpacked/unboxed products today. Edward From eir at cis.upenn.edu Fri Sep 4 21:26:45 2015 From: eir at cis.upenn.edu (Richard Eisenberg) Date: Fri, 4 Sep 2015 14:26:45 -0700 Subject: A process for reporting security-sensitive issues In-Reply-To: <55E93114.6050800@well-typed.com> References: <55E93114.6050800@well-typed.com> Message-ID: I agree with Adam. I've been a little worried about users relying on Safe Haskell, despite #9562. Advertising that Safe Haskell is just a "best effort" (for a rather high bar for "best") but not a guarantee would be nice. Richard On Sep 3, 2015, at 10:50 PM, Adam Gundry wrote: > On 03/09/15 08:22, Michael Smith wrote: >> I feel there should be some process for reporting security-sensitive issues >> in GHC -- for example, #9562 and #10826 in Trac. Perhaps something like the >> SensitiveTicketsPlugin [3] could be used? >> >> [1] https://ghc.haskell.org/trac/ghc/ticket/9562 >> [2] https://ghc.haskell.org/trac/ghc/ticket/10826 >> [3] https://trac-hacks.org/wiki/SensitiveTicketsPlugin > > Thanks for raising this. While I see where you are coming from, I'm > going to argue against it, because I think it creates a false impression > of the security guarantees GHC provides. Such a process may give the > impression that there are people directly tasked with handling such > security bugs, which is not currently the case. > > I think it is unreasonable for the security of a system to depend on GHC > having no type soundness bugs, particularly since GHC is actively used > for developing experimental type system features. #9562 has been open > for a year and we don't have a good solution. > > Relatedly, I think the Safe Haskell documentation should prominently > warn about the existence of #9562 and the possibility of other type > soundness bugs, like it does for compilation safety issues. > > What do others think? > > Adam > > > -- > Adam Gundry, Haskell Consultant > Well-Typed LLP, http://www.well-typed.com/ > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From roma at ro-che.info Fri Sep 4 21:41:38 2015 From: roma at ro-che.info (Roman Cheplyaka) Date: Sat, 5 Sep 2015 00:41:38 +0300 Subject: Unlifted data types In-Reply-To: <1441400654-sup-1647@sabre> References: <1441353701-sup-9422@sabre> <1441390306-sup-6240@sabre> <1441400654-sup-1647@sabre> Message-ID: <55EA1012.2070708@ro-che.info> On 05/09/15 00:23, Edward Z. Yang wrote: > I would certainly agree that in terms of the data that is representable, > there is not much difference; but there is a lot of difference for the > client between Force and a strict field. If I write: > > let x = undefined > y = Strict x > in True > > No error occurs with: > > data Strict = Strict !a > > But an error occurs with: > > data Strict = Strict (Force a) At what point does the error occur here? When evaluating True? What about the following two expressions? const False (let x = undefined y = Strict x in True) let x = undefined y = const False (Strict x) in True Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From roma at ro-che.info Fri Sep 4 21:43:19 2015 From: roma at ro-che.info (Roman Cheplyaka) Date: Sat, 5 Sep 2015 00:43:19 +0300 Subject: Unlifted data types In-Reply-To: <55EA1012.2070708@ro-che.info> References: <1441353701-sup-9422@sabre> <1441390306-sup-6240@sabre> <1441400654-sup-1647@sabre> <55EA1012.2070708@ro-che.info> Message-ID: <55EA1077.9010705@ro-che.info> On 05/09/15 00:41, Roman Cheplyaka wrote: > On 05/09/15 00:23, Edward Z. Yang wrote: >> I would certainly agree that in terms of the data that is representable, >> there is not much difference; but there is a lot of difference for the >> client between Force and a strict field. If I write: >> >> let x = undefined >> y = Strict x >> in True >> >> No error occurs with: >> >> data Strict = Strict !a >> >> But an error occurs with: >> >> data Strict = Strict (Force a) > > At what point does the error occur here? When evaluating True? > > What about the following two expressions? > > const False > (let x = undefined > y = Strict x > in True) > > let x = undefined > y = const False (Strict x) > in True On second though, the second one shouldn't even compile because of the kind error, right? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From dan.doel at gmail.com Fri Sep 4 21:48:49 2015 From: dan.doel at gmail.com (Dan Doel) Date: Fri, 4 Sep 2015 17:48:49 -0400 Subject: Unlifted data types In-Reply-To: <1441400654-sup-1647@sabre> References: <1441353701-sup-9422@sabre> <1441390306-sup-6240@sabre> <1441400654-sup-1647@sabre> Message-ID: On Fri, Sep 4, 2015 at 5:23 PM, Edward Z. Yang wrote: > But this kind of special handling is a bit bothersome. Consider: > > data SPair a b = SPair (!a, !b) > > The constructor has what type? Probably > > SPair :: (Force a, Force b) -> SPair a > > and not: > > SPair :: (a, b) -> SPair a I don't really understand what this example is showing. I don't think SPair is a legal declaration in any scenario. - In current Haskell it's illegal; you can only put ! directly on fields - If !a :: Unlifted, then (,) (!a) is a kind error (same with Force a) > I don't think it interacts any differently than with unpacked/unboxed > products today. I meant like: If T :: Unlifted, then am I allowed to do: data U = MkU {-# UNPACK #-} T ... and what are its semantics? If T is a sum, presumably it's related to the unpacked sums proposal from a couple days ago. Does stuff from this proposal make that proposal simpler? Should they reference things in one another? Will there be optimizations that turn: data E a b :: Unlifted where L :: a -> E a b R :: b -> E a b into |# a , b #| (or whatever the agreed upon syntax is)? Presumably yes. -- Dan From dan.doel at gmail.com Fri Sep 4 21:56:03 2015 From: dan.doel at gmail.com (Dan Doel) Date: Fri, 4 Sep 2015 17:56:03 -0400 Subject: Unlifted data types In-Reply-To: <55EA1012.2070708@ro-che.info> References: <1441353701-sup-9422@sabre> <1441390306-sup-6240@sabre> <1441400654-sup-1647@sabre> <55EA1012.2070708@ro-che.info> Message-ID: If x :: t, and t :: Unlifted, then let x = e in e' has a value that depends on evaluating e regardless of its use in e' (or other things in the let, if they exist). It would be like writing let !x = e in e' today. -- Dan On Fri, Sep 4, 2015 at 5:41 PM, Roman Cheplyaka wrote: > On 05/09/15 00:23, Edward Z. Yang wrote: >> I would certainly agree that in terms of the data that is representable, >> there is not much difference; but there is a lot of difference for the >> client between Force and a strict field. If I write: >> >> let x = undefined >> y = Strict x >> in True >> >> No error occurs with: >> >> data Strict = Strict !a >> >> But an error occurs with: >> >> data Strict = Strict (Force a) > > At what point does the error occur here? When evaluating True? > > What about the following two expressions? > > const False > (let x = undefined > y = Strict x > in True) > > let x = undefined > y = const False (Strict x) > in True > > Roman > > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > From mike at izbicki.me Fri Sep 4 23:39:24 2015 From: mike at izbicki.me (Mike Izbicki) Date: Fri, 4 Sep 2015 16:39:24 -0700 Subject: question about GHC API on GHC plugin In-Reply-To: References: <1439014742-sup-2126@sabre> Message-ID: I'm still having trouble creating Core code that can extract superclass dictionaries from a given dictionary. I suspect the problem is that I don't actually understand what the Core code to do this is supposed to look like. I keep getting the errors mentioned above when I try what I think should work. Can anyone help me figure this out? Is there any chance this is a bug in how GHC parses Core? On Tue, Aug 25, 2015 at 9:24 PM, Mike Izbicki wrote: > The purpose of the plugin is to automatically improve the numerical > stability of Haskell code. It is supposed to identify numeric > expressions, then use Herbie (https://github.com/uwplse/herbie) to > generate a numerically stable version, then rewrite the numerically > stable version back into the code. The first two steps were really > easy. It's the last step of inserting back into the code that I'm > having tons of trouble with. Core is a lot more complicated than I > thought :) > > I'm not sure what you mean by the CoreExpr representation? Here's the > output of the pretty printer you gave: > App (App (App (App (Var Id{+,r2T,ForAllTy TyVar{a} (FunTy (TyConApp > Num [TyVarTy TyVar{a}]) (FunTy (TyVarTy TyVar{a}) (FunTy (TyVarTy > TyVar{a}) (TyVarTy TyVar{a})))),VanillaId,Info{0,SpecInfo [] > ,NoUnfolding,MayHaveCafRefs,NoOneShotInfo,InlinePragma > {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = > Nothing, inl_act = AlwaysActive, inl_rule = > FunLike},NoOccInfo,StrictSig (DmdType [] (Dunno NoCPR)),JD > {strd = Lazy, absd = Use Many Used},0}}) (Type (TyVarTy TyVar{a}))) > (App (Var Id{$p1Fractional,rh3,ForAllTy TyVar{a} (FunTy (TyConApp > Fractional [TyVarTy TyVar{a}]) (TyConApp Num [TyVarTy > TyVar{a}])),ClassOpId ,Info{1,SpecInfo [BuiltinRule {ru_name = > "Class op $p1Fractional", ru_fn = $p1Fractional, ru_nargs = 2, ru_try > = }] ,NoUnfolding,NoCafRefs,NoOneShotInfo,InlinePragma > {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = > Nothing, inl_act = AlwaysActive, inl_rule = > FunLike},NoOccInfo,StrictSig (DmdType [JD {strd = Str (SProd > [Str HeadStr,Lazy,Lazy,Lazy]), absd = Use Many (UProd [Use Many > Used,Abs,Abs,Abs])}] (Dunno NoCPR)),JD {strd = Lazy, absd = Use Many > Used},0}}) (App (Var Id{$p1Floating,rh2,ForAllTy TyVar{a} (FunTy > (TyConApp Floating [TyVarTy TyVar{a}]) (TyConApp Fractional [TyVarTy > TyVar{a}])),ClassOpId ,Info{1,SpecInfo [BuiltinRule {ru_name = > "Class op $p1Floating", ru_fn = $p1Floating, ru_nargs = 2, ru_try = > }] ,NoUnfolding,NoCafRefs,NoOneShotInfo,InlinePragma > {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = > Nothing, inl_act = AlwaysActive, inl_rule = > FunLike},NoOccInfo,StrictSig (DmdType [JD {strd = Str (SProd > [Str HeadStr,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy]), > absd = Use Many (UProd [Use Many > Used,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs])}] > (Dunno NoCPR)),JD {strd = Lazy, absd = Use Many Used},0}}) (Var > Id{$dFloating,aBM,TyConApp Floating [TyVarTy > TyVar{a}],VanillaId,Info{0,SpecInfo [] > ,NoUnfolding,MayHaveCafRefs,NoOneShotInfo,InlinePragma > {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = > Nothing, inl_act = AlwaysActive, inl_rule = > FunLike},NoOccInfo,StrictSig (DmdType [] (Dunno NoCPR)),JD > {strd = Lazy, absd = Use Many Used},0}})))) (Var Id{x1,anU,TyVarTy > TyVar{a},VanillaId,Info{0,SpecInfo [] > ,NoUnfolding,MayHaveCafRefs,NoOneShotInfo,InlinePragma > {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = > Nothing, inl_act = AlwaysActive, inl_rule = > FunLike},NoOccInfo,StrictSig (DmdType [] (Dunno NoCPR)),JD > {strd = Lazy, absd = Use Many Used},0}})) (Var Id{x1,anU,TyVarTy > TyVar{a},VanillaId,Info{0,SpecInfo [] > ,NoUnfolding,MayHaveCafRefs,NoOneShotInfo,InlinePragma > {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = > Nothing, inl_act = AlwaysActive, inl_rule = > FunLike},NoOccInfo,StrictSig (DmdType [] (Dunno NoCPR)),JD > {strd = Lazy, absd = Use Many Used},0}}) > > You can find my pretty printer (and all the other code for the plugin) > at: https://github.com/mikeizbicki/herbie-haskell/blob/master/src/Herbie.hs#L627 > > The function getDictMap > (https://github.com/mikeizbicki/herbie-haskell/blob/master/src/Herbie.hs#L171) > is where I'm constructing the dictionaries that are getting inserted > back into the Core. > > On Tue, Aug 25, 2015 at 7:17 PM, ?mer Sinan A?acan wrote: >> It seems like in your App syntax you're having a non-function in function >> position. You can see this by looking at what failing function >> (splitFunTy_maybe) is doing: >> >> splitFunTy_maybe :: Type -> Maybe (Type, Type) >> -- ^ Attempts to extract the argument and result types from a type >> ... (definition is not important) ... >> >> Then it's used like this at the error site: >> >> (arg_ty, res_ty) = expectJust "cpeBody:collect_args" $ >> splitFunTy_maybe fun_ty >> >> In your case this function is returning Nothing and then exceptJust is >> signalling the panic. >> >> Your code looked correct to me, I don't see any problems with that. Maybe you're >> using something wrong as selectors. Could you paste CoreExpr representation of >> your program? >> >> It may also be the case that the panic is caused by something else, maybe your >> syntax is invalidating some assumptions/invariants in GHC but it's not >> immediately checked etc. Working at the Core level is frustrating at times. >> >> Can I ask what kind of plugin are you working on? >> >> (Btw, how did you generate this representation of AST? Did you write it >> manually? If you have a pretty-printer, would you mind sharing it?) >> >> 2015-08-25 18:50 GMT-04:00 Mike Izbicki : >>> Thanks ?mer! >>> >>> I'm able to get dictionaries for the superclasses of a class now, but >>> I get an error whenever I try to get a dictionary for a >>> super-superclass. Here's the Haskell expression I'm working with: >>> >>> test1 :: Floating a => a -> a >>> test1 x1 = x1+x1 >>> >>> The original core is: >>> >>> + @ a $dNum_aJu x1 x1 >>> >>> But my plugin is replacing it with the core: >>> >>> + @ a ($p1Fractional ($p1Floating $dFloating_aJq)) x1 x1 >>> >>> The only difference is the way I'm getting the Num dictionary. The >>> corresponding AST (annotated with variable names and types) is: >>> >>> App >>> (App >>> (App >>> (App >>> (Var +::forall a. Num a => a -> a -> a) >>> (Type a) >>> ) >>> (App >>> (Var $p1Fractional::forall a. Fractional a => Num a) >>> (App >>> (Var $p1Floating::forall a. Floating a => Fractional a) >>> (Var $dFloating_aJq::Floating a) >>> ) >>> ) >>> ) >>> (Var x1::'a') >>> ) >>> (Var x1::'a') >>> >>> When I insert, GHC gives the following error: >>> >>> ghc: panic! (the 'impossible' happened) >>> (GHC version 7.10.1 for x86_64-unknown-linux): >>> expectJust cpeBody:collect_args >>> >>> What am I doing wrong with extracting these super-superclass >>> dictionaries? I've looked up the code for cpeBody in GHC, but I can't >>> figure out what it's trying to do, so I'm not sure why it's failing on >>> my core. >>> >>> On Mon, Aug 24, 2015 at 7:10 PM, ?mer Sinan A?acan wrote: >>>> Mike, here's a piece of code that may be helpful to you: >>>> >>>> https://github.com/osa1/sc-plugin/blob/master/src/Supercompilation/Show.hs >>>> >>>> Copy this module to your plugin, it doesn't have any dependencies other than >>>> ghc itself. When your plugin is initialized, update `dynFlags_ref` with your >>>> DynFlags as first thing to do. Then use Show instance to print AST directly. >>>> >>>> Horrible hack, but very useful for learning purposes. In fact, I don't know how >>>> else we can learn what Core is generated for a given code, and reverse-engineer >>>> to figure out details. >>>> >>>> Hope it helps. >>>> >>>> 2015-08-24 21:59 GMT-04:00 ?mer Sinan A?acan : >>>>>> Lets say I'm running the plugin on a function with signature `Floating a => a >>>>>> -> a`, then the plugin has access to the `Floating` dictionary for the type. >>>>>> But if I want to add two numbers together, I need the `Num` dictionary. I >>>>>> know I should have access to `Num` since it's a superclass of `Floating`. >>>>>> How can I get access to these superclass dictionaries? >>>>> >>>>> I don't have a working code for this but this should get you started: >>>>> >>>>> let ord_dictionary :: Id = ... >>>>> ord_class :: Class = ... >>>>> in >>>>> mkApps (Var (head (classSCSels ord_class))) [Var ord_dictionary] >>>>> >>>>> I don't know how to get Class for Ord. I do `head` here because in the case of >>>>> Ord we only have one superclass so `classSCSels` should have one Id. Then I >>>>> apply ord_dictionary to this selector and it should return dictionary for Eq. >>>>> >>>>> I assumed you already have ord_dictionary, it should be passed to your function >>>>> already if you had `(Ord a) => ` in your function. >>>>> >>>>> >>>>> Now I realized you asked for getting Num from Floating. I think you should >>>>> follow a similar path except you need two applications, first to get Fractional >>>>> from Floating and second to get Num from Fractional: >>>>> >>>>> mkApps (Var (head (classSCSels fractional_class))) >>>>> [mkApps (Var (head (classSCSels floating_class))) >>>>> [Var floating_dictionary]] >>>>> >>>>> Return value should be a Num dictionary. From dan.doel at gmail.com Sat Sep 5 01:21:29 2015 From: dan.doel at gmail.com (Dan Doel) Date: Fri, 4 Sep 2015 21:21:29 -0400 Subject: Unlifted data types In-Reply-To: <1441400654-sup-1647@sabre> References: <1441353701-sup-9422@sabre> <1441390306-sup-6240@sabre> <1441400654-sup-1647@sabre> Message-ID: Here are some additional thoughts. If we examine an analogue of some of your examples: data MutVar a = MV (MutVar# RealWorld a) main = do let mv# = undefined let mv = MV mv# putStrLn "Okay." The above is illegal. Instead we _must_ write: let !mv# = undefined which signals that evaluation is occurring. So it is impossible to accidentally go from: main = do let mv = MV undefined putStrLn "Okay." which prints "Okay.", to something that throws an exception, without having a pretty good indication that you're doing so. I would guess this is desirable, so perhaps it should be mandated for Unlifted as well. ---- However, the above point confuses me with respect to another example. The proposal says that: data Id :: * -> Unlifted where Id :: a -> Id a could/should be compiled with no overhead over `a`, like a newtype. However, if Unlifted things have operational semantics like #, what does the following do: let x :: Id a !x = Id undefined The ! should evaluate to the Id constructor, but we're not representing it, so it actually doesn't evaluate anything? But: let x :: Id a !x = undefined throws an exception? Whereas for newtypes, both throw exceptions with a !x definition, or don't with an x definition? Is it actually possible to make Id behave this way without any representational overhead? I'm a little skeptical. I think that only Force (and Box) might be able to have no representational overhead. -- Dan On Fri, Sep 4, 2015 at 5:23 PM, Edward Z. Yang wrote: > Excerpts from Dan Doel's message of 2015-09-04 13:09:26 -0700: >> Okay. That answers another question I had, which was whether MutVar# >> and such would go in the new kind. >> >> So now we have partial, extended natural numbers: >> >> data PNat :: * where >> PZero :: PNat >> PSuc :: PNat -> PNat >> >> A flat domain of natural numbers: >> >> data FNat :: * where >> FZero :: FNat >> FSuc :: !FNat -> FNat >> >> And two sets of natural numbers: >> >> Force FNat :: Unlifted >> >> data UNat :: Unlifted where >> UZero :: UNat >> USuc :: UNat -> UNat >> >> And really perhaps two flat domains (and three sets), if you use Force >> instead of !, which would differ on who ensures the evaluation. That's >> kind of a lot of incompatible definitions of essentially the same >> thing (PNat being the significantly different thing). >> >> I was kind of more enthused about first class !a. For instance, if you >> think about the opening quote by Bob Harper, he's basically wrong. The >> flat domain FNat is the natural numbers (existing in an overall lazy >> language), and has the reasoning properties he wants to teach students >> about with very little complication. It'd be satisfying to recognize >> that unlifting the outer-most part gets you exactly there, with >> whatever performance characteristics that implies. Or to get rid of ! >> and use Unlifted definitions instead. >> >> Maybe backwards compatibility mandates the duplication, but it'd be >> nice if some synthesis could be reached. > > I would certainly agree that in terms of the data that is representable, > there is not much difference; but there is a lot of difference for the > client between Force and a strict field. If I write: > > let x = undefined > y = Strict x > in True > > No error occurs with: > > data Strict = Strict !a > > But an error occurs with: > > data Strict = Strict (Force a) > > One possibility for how to reconcile the difference for BC is to posit > that there are just two different constructors: > > Strict :: a -> Strict a > Strict! :: Force a -> Strict a > > But this kind of special handling is a bit bothersome. Consider: > > data SPair a b = SPair (!a, !b) > > The constructor has what type? Probably > > SPair :: (Force a, Force b) -> SPair a > > and not: > > SPair :: (a, b) -> SPair a > >> It'd also be good to think about/specify how this is going to interact >> with unpacked/unboxed sums. > > I don't think it interacts any differently than with unpacked/unboxed > products today. > > Edward From ezyang at mit.edu Sat Sep 5 03:38:27 2015 From: ezyang at mit.edu (Edward Z. Yang) Date: Fri, 04 Sep 2015 20:38:27 -0700 Subject: Unlifted data types In-Reply-To: References: <1441353701-sup-9422@sabre> <1441390306-sup-6240@sabre> <1441400654-sup-1647@sabre> Message-ID: <1441423737-sup-9277@sabre> Excerpts from Dan Doel's message of 2015-09-04 14:48:49 -0700: > I don't really understand what this example is showing. I don't think > SPair is a legal > declaration in any scenario. > > - In current Haskell it's illegal; you can only put ! directly on fields > - If !a :: Unlifted, then (,) (!a) is a kind error (same with Force a) This is true. Perhaps it should be possible to define data types which are levity polymorphic, so SPair can kind as * -> * -> *, Unlifted -> Unlifted -> *, etc. > > I don't think it interacts any differently than with unpacked/unboxed > > products today. > > I meant like: > > If T :: Unlifted, then am I allowed to do: > > data U = MkU {-# UNPACK #-} T ... > > and what are its semantics? If T is a sum, presumably it's related to > the unpacked > sums proposal from a couple days ago. Does stuff from this proposal > make that proposal > simpler? Should they reference things in one another? Ah, this is a good question. I think you can just directly UNPACK unlifted types, without a strict bang pattern. I've added a note to the proposal. > Will there be optimizations that turn: > > data E a b :: Unlifted where > L :: a -> E a b > R :: b -> E a b > > into |# a , b #| (or whatever the agreed upon syntax is)? Presumably yes. Yes, it should follow the same rules as https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes#Unpacking Edward From omeragacan at gmail.com Sat Sep 5 04:16:34 2015 From: omeragacan at gmail.com (=?UTF-8?Q?=C3=96mer_Sinan_A=C4=9Facan?=) Date: Sat, 5 Sep 2015 00:16:34 -0400 Subject: question about GHC API on GHC plugin In-Reply-To: References: <1439014742-sup-2126@sabre> Message-ID: Hi Mike, I'll try to hack an example for you some time tomorrow(I'm returning from ICFP and have some long flights ahead of me). But in the meantime, here's a working Core code, generated by GHC: f_rjH :: forall a_alz. Ord a_alz => a_alz -> Bool f_rjH = \ (@ a_aCH) ($dOrd_aCI :: Ord a_aCH) (eta_B1 :: a_aCH) -> == @ a_aCH (GHC.Classes.$p1Ord @ a_aCH $dOrd_aCI) eta_B1 eta_B1 You can clearly see here how Eq dictionary is selected from Ord dicitonary($dOrd_aCI in the example), it's just an application of selector to type and dictionary, that's all. This is generated from this code: {-# NOINLINE f #-} f :: Ord a => a -> Bool f x = x == x Compile it with this: ghc --make -fforce-recomp -O0 -ddump-simpl -ddump-to-file Main.hs -dsuppress-idinfo > Can anyone help me figure this out? Is there any chance this is a bug in how > GHC parses Core? This seems unlikely, because GHC doesn't have a Core parser and there's no Core parsing going on here, you're parsing your Code in the form of AST(CoreExpr, CoreProgram etc. defined in CoreSyn.hs). Did you mean something else and am I misunderstanding? 2015-09-04 19:39 GMT-04:00 Mike Izbicki : > I'm still having trouble creating Core code that can extract > superclass dictionaries from a given dictionary. I suspect the > problem is that I don't actually understand what the Core code to do > this is supposed to look like. I keep getting the errors mentioned > above when I try what I think should work. > > Can anyone help me figure this out? Is there any chance this is a bug > in how GHC parses Core? > > On Tue, Aug 25, 2015 at 9:24 PM, Mike Izbicki wrote: >> The purpose of the plugin is to automatically improve the numerical >> stability of Haskell code. It is supposed to identify numeric >> expressions, then use Herbie (https://github.com/uwplse/herbie) to >> generate a numerically stable version, then rewrite the numerically >> stable version back into the code. The first two steps were really >> easy. It's the last step of inserting back into the code that I'm >> having tons of trouble with. Core is a lot more complicated than I >> thought :) >> >> I'm not sure what you mean by the CoreExpr representation? Here's the >> output of the pretty printer you gave: >> App (App (App (App (Var Id{+,r2T,ForAllTy TyVar{a} (FunTy (TyConApp >> Num [TyVarTy TyVar{a}]) (FunTy (TyVarTy TyVar{a}) (FunTy (TyVarTy >> TyVar{a}) (TyVarTy TyVar{a})))),VanillaId,Info{0,SpecInfo [] >> ,NoUnfolding,MayHaveCafRefs,NoOneShotInfo,InlinePragma >> {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = >> Nothing, inl_act = AlwaysActive, inl_rule = >> FunLike},NoOccInfo,StrictSig (DmdType [] (Dunno NoCPR)),JD >> {strd = Lazy, absd = Use Many Used},0}}) (Type (TyVarTy TyVar{a}))) >> (App (Var Id{$p1Fractional,rh3,ForAllTy TyVar{a} (FunTy (TyConApp >> Fractional [TyVarTy TyVar{a}]) (TyConApp Num [TyVarTy >> TyVar{a}])),ClassOpId ,Info{1,SpecInfo [BuiltinRule {ru_name = >> "Class op $p1Fractional", ru_fn = $p1Fractional, ru_nargs = 2, ru_try >> = }] ,NoUnfolding,NoCafRefs,NoOneShotInfo,InlinePragma >> {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = >> Nothing, inl_act = AlwaysActive, inl_rule = >> FunLike},NoOccInfo,StrictSig (DmdType [JD {strd = Str (SProd >> [Str HeadStr,Lazy,Lazy,Lazy]), absd = Use Many (UProd [Use Many >> Used,Abs,Abs,Abs])}] (Dunno NoCPR)),JD {strd = Lazy, absd = Use Many >> Used},0}}) (App (Var Id{$p1Floating,rh2,ForAllTy TyVar{a} (FunTy >> (TyConApp Floating [TyVarTy TyVar{a}]) (TyConApp Fractional [TyVarTy >> TyVar{a}])),ClassOpId ,Info{1,SpecInfo [BuiltinRule {ru_name = >> "Class op $p1Floating", ru_fn = $p1Floating, ru_nargs = 2, ru_try = >> }] ,NoUnfolding,NoCafRefs,NoOneShotInfo,InlinePragma >> {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = >> Nothing, inl_act = AlwaysActive, inl_rule = >> FunLike},NoOccInfo,StrictSig (DmdType [JD {strd = Str (SProd >> [Str HeadStr,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy]), >> absd = Use Many (UProd [Use Many >> Used,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs])}] >> (Dunno NoCPR)),JD {strd = Lazy, absd = Use Many Used},0}}) (Var >> Id{$dFloating,aBM,TyConApp Floating [TyVarTy >> TyVar{a}],VanillaId,Info{0,SpecInfo [] >> ,NoUnfolding,MayHaveCafRefs,NoOneShotInfo,InlinePragma >> {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = >> Nothing, inl_act = AlwaysActive, inl_rule = >> FunLike},NoOccInfo,StrictSig (DmdType [] (Dunno NoCPR)),JD >> {strd = Lazy, absd = Use Many Used},0}})))) (Var Id{x1,anU,TyVarTy >> TyVar{a},VanillaId,Info{0,SpecInfo [] >> ,NoUnfolding,MayHaveCafRefs,NoOneShotInfo,InlinePragma >> {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = >> Nothing, inl_act = AlwaysActive, inl_rule = >> FunLike},NoOccInfo,StrictSig (DmdType [] (Dunno NoCPR)),JD >> {strd = Lazy, absd = Use Many Used},0}})) (Var Id{x1,anU,TyVarTy >> TyVar{a},VanillaId,Info{0,SpecInfo [] >> ,NoUnfolding,MayHaveCafRefs,NoOneShotInfo,InlinePragma >> {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = >> Nothing, inl_act = AlwaysActive, inl_rule = >> FunLike},NoOccInfo,StrictSig (DmdType [] (Dunno NoCPR)),JD >> {strd = Lazy, absd = Use Many Used},0}}) >> >> You can find my pretty printer (and all the other code for the plugin) >> at: https://github.com/mikeizbicki/herbie-haskell/blob/master/src/Herbie.hs#L627 >> >> The function getDictMap >> (https://github.com/mikeizbicki/herbie-haskell/blob/master/src/Herbie.hs#L171) >> is where I'm constructing the dictionaries that are getting inserted >> back into the Core. >> >> On Tue, Aug 25, 2015 at 7:17 PM, ?mer Sinan A?acan wrote: >>> It seems like in your App syntax you're having a non-function in function >>> position. You can see this by looking at what failing function >>> (splitFunTy_maybe) is doing: >>> >>> splitFunTy_maybe :: Type -> Maybe (Type, Type) >>> -- ^ Attempts to extract the argument and result types from a type >>> ... (definition is not important) ... >>> >>> Then it's used like this at the error site: >>> >>> (arg_ty, res_ty) = expectJust "cpeBody:collect_args" $ >>> splitFunTy_maybe fun_ty >>> >>> In your case this function is returning Nothing and then exceptJust is >>> signalling the panic. >>> >>> Your code looked correct to me, I don't see any problems with that. Maybe you're >>> using something wrong as selectors. Could you paste CoreExpr representation of >>> your program? >>> >>> It may also be the case that the panic is caused by something else, maybe your >>> syntax is invalidating some assumptions/invariants in GHC but it's not >>> immediately checked etc. Working at the Core level is frustrating at times. >>> >>> Can I ask what kind of plugin are you working on? >>> >>> (Btw, how did you generate this representation of AST? Did you write it >>> manually? If you have a pretty-printer, would you mind sharing it?) >>> >>> 2015-08-25 18:50 GMT-04:00 Mike Izbicki : >>>> Thanks ?mer! >>>> >>>> I'm able to get dictionaries for the superclasses of a class now, but >>>> I get an error whenever I try to get a dictionary for a >>>> super-superclass. Here's the Haskell expression I'm working with: >>>> >>>> test1 :: Floating a => a -> a >>>> test1 x1 = x1+x1 >>>> >>>> The original core is: >>>> >>>> + @ a $dNum_aJu x1 x1 >>>> >>>> But my plugin is replacing it with the core: >>>> >>>> + @ a ($p1Fractional ($p1Floating $dFloating_aJq)) x1 x1 >>>> >>>> The only difference is the way I'm getting the Num dictionary. The >>>> corresponding AST (annotated with variable names and types) is: >>>> >>>> App >>>> (App >>>> (App >>>> (App >>>> (Var +::forall a. Num a => a -> a -> a) >>>> (Type a) >>>> ) >>>> (App >>>> (Var $p1Fractional::forall a. Fractional a => Num a) >>>> (App >>>> (Var $p1Floating::forall a. Floating a => Fractional a) >>>> (Var $dFloating_aJq::Floating a) >>>> ) >>>> ) >>>> ) >>>> (Var x1::'a') >>>> ) >>>> (Var x1::'a') >>>> >>>> When I insert, GHC gives the following error: >>>> >>>> ghc: panic! (the 'impossible' happened) >>>> (GHC version 7.10.1 for x86_64-unknown-linux): >>>> expectJust cpeBody:collect_args >>>> >>>> What am I doing wrong with extracting these super-superclass >>>> dictionaries? I've looked up the code for cpeBody in GHC, but I can't >>>> figure out what it's trying to do, so I'm not sure why it's failing on >>>> my core. >>>> >>>> On Mon, Aug 24, 2015 at 7:10 PM, ?mer Sinan A?acan wrote: >>>>> Mike, here's a piece of code that may be helpful to you: >>>>> >>>>> https://github.com/osa1/sc-plugin/blob/master/src/Supercompilation/Show.hs >>>>> >>>>> Copy this module to your plugin, it doesn't have any dependencies other than >>>>> ghc itself. When your plugin is initialized, update `dynFlags_ref` with your >>>>> DynFlags as first thing to do. Then use Show instance to print AST directly. >>>>> >>>>> Horrible hack, but very useful for learning purposes. In fact, I don't know how >>>>> else we can learn what Core is generated for a given code, and reverse-engineer >>>>> to figure out details. >>>>> >>>>> Hope it helps. >>>>> >>>>> 2015-08-24 21:59 GMT-04:00 ?mer Sinan A?acan : >>>>>>> Lets say I'm running the plugin on a function with signature `Floating a => a >>>>>>> -> a`, then the plugin has access to the `Floating` dictionary for the type. >>>>>>> But if I want to add two numbers together, I need the `Num` dictionary. I >>>>>>> know I should have access to `Num` since it's a superclass of `Floating`. >>>>>>> How can I get access to these superclass dictionaries? >>>>>> >>>>>> I don't have a working code for this but this should get you started: >>>>>> >>>>>> let ord_dictionary :: Id = ... >>>>>> ord_class :: Class = ... >>>>>> in >>>>>> mkApps (Var (head (classSCSels ord_class))) [Var ord_dictionary] >>>>>> >>>>>> I don't know how to get Class for Ord. I do `head` here because in the case of >>>>>> Ord we only have one superclass so `classSCSels` should have one Id. Then I >>>>>> apply ord_dictionary to this selector and it should return dictionary for Eq. >>>>>> >>>>>> I assumed you already have ord_dictionary, it should be passed to your function >>>>>> already if you had `(Ord a) => ` in your function. >>>>>> >>>>>> >>>>>> Now I realized you asked for getting Num from Floating. I think you should >>>>>> follow a similar path except you need two applications, first to get Fractional >>>>>> from Floating and second to get Num from Fractional: >>>>>> >>>>>> mkApps (Var (head (classSCSels fractional_class))) >>>>>> [mkApps (Var (head (classSCSels floating_class))) >>>>>> [Var floating_dictionary]] >>>>>> >>>>>> Return value should be a Num dictionary. > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From omeragacan at gmail.com Sat Sep 5 04:18:51 2015 From: omeragacan at gmail.com (=?UTF-8?Q?=C3=96mer_Sinan_A=C4=9Facan?=) Date: Sat, 5 Sep 2015 00:18:51 -0400 Subject: question about GHC API on GHC plugin In-Reply-To: References: <1439014742-sup-2126@sabre> Message-ID: Typo: "You're parsing your code" I mean "You're passing your code" 2015-09-05 0:16 GMT-04:00 ?mer Sinan A?acan : > Hi Mike, > > I'll try to hack an example for you some time tomorrow(I'm returning from ICFP > and have some long flights ahead of me). > > But in the meantime, here's a working Core code, generated by GHC: > > f_rjH :: forall a_alz. Ord a_alz => a_alz -> Bool > f_rjH = > \ (@ a_aCH) ($dOrd_aCI :: Ord a_aCH) (eta_B1 :: a_aCH) -> > == @ a_aCH (GHC.Classes.$p1Ord @ a_aCH $dOrd_aCI) eta_B1 eta_B1 > > You can clearly see here how Eq dictionary is selected from Ord > dicitonary($dOrd_aCI in the example), it's just an application of selector to > type and dictionary, that's all. > > This is generated from this code: > > {-# NOINLINE f #-} > f :: Ord a => a -> Bool > f x = x == x > > Compile it with this: > > ghc --make -fforce-recomp -O0 -ddump-simpl -ddump-to-file Main.hs > -dsuppress-idinfo > >> Can anyone help me figure this out? Is there any chance this is a bug in how >> GHC parses Core? > > This seems unlikely, because GHC doesn't have a Core parser and there's no Core > parsing going on here, you're parsing your Code in the form of AST(CoreExpr, > CoreProgram etc. defined in CoreSyn.hs). Did you mean something else and am I > misunderstanding? > > 2015-09-04 19:39 GMT-04:00 Mike Izbicki : >> I'm still having trouble creating Core code that can extract >> superclass dictionaries from a given dictionary. I suspect the >> problem is that I don't actually understand what the Core code to do >> this is supposed to look like. I keep getting the errors mentioned >> above when I try what I think should work. >> >> Can anyone help me figure this out? Is there any chance this is a bug >> in how GHC parses Core? >> >> On Tue, Aug 25, 2015 at 9:24 PM, Mike Izbicki wrote: >>> The purpose of the plugin is to automatically improve the numerical >>> stability of Haskell code. It is supposed to identify numeric >>> expressions, then use Herbie (https://github.com/uwplse/herbie) to >>> generate a numerically stable version, then rewrite the numerically >>> stable version back into the code. The first two steps were really >>> easy. It's the last step of inserting back into the code that I'm >>> having tons of trouble with. Core is a lot more complicated than I >>> thought :) >>> >>> I'm not sure what you mean by the CoreExpr representation? Here's the >>> output of the pretty printer you gave: >>> App (App (App (App (Var Id{+,r2T,ForAllTy TyVar{a} (FunTy (TyConApp >>> Num [TyVarTy TyVar{a}]) (FunTy (TyVarTy TyVar{a}) (FunTy (TyVarTy >>> TyVar{a}) (TyVarTy TyVar{a})))),VanillaId,Info{0,SpecInfo [] >>> ,NoUnfolding,MayHaveCafRefs,NoOneShotInfo,InlinePragma >>> {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = >>> Nothing, inl_act = AlwaysActive, inl_rule = >>> FunLike},NoOccInfo,StrictSig (DmdType [] (Dunno NoCPR)),JD >>> {strd = Lazy, absd = Use Many Used},0}}) (Type (TyVarTy TyVar{a}))) >>> (App (Var Id{$p1Fractional,rh3,ForAllTy TyVar{a} (FunTy (TyConApp >>> Fractional [TyVarTy TyVar{a}]) (TyConApp Num [TyVarTy >>> TyVar{a}])),ClassOpId ,Info{1,SpecInfo [BuiltinRule {ru_name = >>> "Class op $p1Fractional", ru_fn = $p1Fractional, ru_nargs = 2, ru_try >>> = }] ,NoUnfolding,NoCafRefs,NoOneShotInfo,InlinePragma >>> {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = >>> Nothing, inl_act = AlwaysActive, inl_rule = >>> FunLike},NoOccInfo,StrictSig (DmdType [JD {strd = Str (SProd >>> [Str HeadStr,Lazy,Lazy,Lazy]), absd = Use Many (UProd [Use Many >>> Used,Abs,Abs,Abs])}] (Dunno NoCPR)),JD {strd = Lazy, absd = Use Many >>> Used},0}}) (App (Var Id{$p1Floating,rh2,ForAllTy TyVar{a} (FunTy >>> (TyConApp Floating [TyVarTy TyVar{a}]) (TyConApp Fractional [TyVarTy >>> TyVar{a}])),ClassOpId ,Info{1,SpecInfo [BuiltinRule {ru_name = >>> "Class op $p1Floating", ru_fn = $p1Floating, ru_nargs = 2, ru_try = >>> }] ,NoUnfolding,NoCafRefs,NoOneShotInfo,InlinePragma >>> {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = >>> Nothing, inl_act = AlwaysActive, inl_rule = >>> FunLike},NoOccInfo,StrictSig (DmdType [JD {strd = Str (SProd >>> [Str HeadStr,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy]), >>> absd = Use Many (UProd [Use Many >>> Used,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs])}] >>> (Dunno NoCPR)),JD {strd = Lazy, absd = Use Many Used},0}}) (Var >>> Id{$dFloating,aBM,TyConApp Floating [TyVarTy >>> TyVar{a}],VanillaId,Info{0,SpecInfo [] >>> ,NoUnfolding,MayHaveCafRefs,NoOneShotInfo,InlinePragma >>> {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = >>> Nothing, inl_act = AlwaysActive, inl_rule = >>> FunLike},NoOccInfo,StrictSig (DmdType [] (Dunno NoCPR)),JD >>> {strd = Lazy, absd = Use Many Used},0}})))) (Var Id{x1,anU,TyVarTy >>> TyVar{a},VanillaId,Info{0,SpecInfo [] >>> ,NoUnfolding,MayHaveCafRefs,NoOneShotInfo,InlinePragma >>> {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = >>> Nothing, inl_act = AlwaysActive, inl_rule = >>> FunLike},NoOccInfo,StrictSig (DmdType [] (Dunno NoCPR)),JD >>> {strd = Lazy, absd = Use Many Used},0}})) (Var Id{x1,anU,TyVarTy >>> TyVar{a},VanillaId,Info{0,SpecInfo [] >>> ,NoUnfolding,MayHaveCafRefs,NoOneShotInfo,InlinePragma >>> {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = >>> Nothing, inl_act = AlwaysActive, inl_rule = >>> FunLike},NoOccInfo,StrictSig (DmdType [] (Dunno NoCPR)),JD >>> {strd = Lazy, absd = Use Many Used},0}}) >>> >>> You can find my pretty printer (and all the other code for the plugin) >>> at: https://github.com/mikeizbicki/herbie-haskell/blob/master/src/Herbie.hs#L627 >>> >>> The function getDictMap >>> (https://github.com/mikeizbicki/herbie-haskell/blob/master/src/Herbie.hs#L171) >>> is where I'm constructing the dictionaries that are getting inserted >>> back into the Core. >>> >>> On Tue, Aug 25, 2015 at 7:17 PM, ?mer Sinan A?acan wrote: >>>> It seems like in your App syntax you're having a non-function in function >>>> position. You can see this by looking at what failing function >>>> (splitFunTy_maybe) is doing: >>>> >>>> splitFunTy_maybe :: Type -> Maybe (Type, Type) >>>> -- ^ Attempts to extract the argument and result types from a type >>>> ... (definition is not important) ... >>>> >>>> Then it's used like this at the error site: >>>> >>>> (arg_ty, res_ty) = expectJust "cpeBody:collect_args" $ >>>> splitFunTy_maybe fun_ty >>>> >>>> In your case this function is returning Nothing and then exceptJust is >>>> signalling the panic. >>>> >>>> Your code looked correct to me, I don't see any problems with that. Maybe you're >>>> using something wrong as selectors. Could you paste CoreExpr representation of >>>> your program? >>>> >>>> It may also be the case that the panic is caused by something else, maybe your >>>> syntax is invalidating some assumptions/invariants in GHC but it's not >>>> immediately checked etc. Working at the Core level is frustrating at times. >>>> >>>> Can I ask what kind of plugin are you working on? >>>> >>>> (Btw, how did you generate this representation of AST? Did you write it >>>> manually? If you have a pretty-printer, would you mind sharing it?) >>>> >>>> 2015-08-25 18:50 GMT-04:00 Mike Izbicki : >>>>> Thanks ?mer! >>>>> >>>>> I'm able to get dictionaries for the superclasses of a class now, but >>>>> I get an error whenever I try to get a dictionary for a >>>>> super-superclass. Here's the Haskell expression I'm working with: >>>>> >>>>> test1 :: Floating a => a -> a >>>>> test1 x1 = x1+x1 >>>>> >>>>> The original core is: >>>>> >>>>> + @ a $dNum_aJu x1 x1 >>>>> >>>>> But my plugin is replacing it with the core: >>>>> >>>>> + @ a ($p1Fractional ($p1Floating $dFloating_aJq)) x1 x1 >>>>> >>>>> The only difference is the way I'm getting the Num dictionary. The >>>>> corresponding AST (annotated with variable names and types) is: >>>>> >>>>> App >>>>> (App >>>>> (App >>>>> (App >>>>> (Var +::forall a. Num a => a -> a -> a) >>>>> (Type a) >>>>> ) >>>>> (App >>>>> (Var $p1Fractional::forall a. Fractional a => Num a) >>>>> (App >>>>> (Var $p1Floating::forall a. Floating a => Fractional a) >>>>> (Var $dFloating_aJq::Floating a) >>>>> ) >>>>> ) >>>>> ) >>>>> (Var x1::'a') >>>>> ) >>>>> (Var x1::'a') >>>>> >>>>> When I insert, GHC gives the following error: >>>>> >>>>> ghc: panic! (the 'impossible' happened) >>>>> (GHC version 7.10.1 for x86_64-unknown-linux): >>>>> expectJust cpeBody:collect_args >>>>> >>>>> What am I doing wrong with extracting these super-superclass >>>>> dictionaries? I've looked up the code for cpeBody in GHC, but I can't >>>>> figure out what it's trying to do, so I'm not sure why it's failing on >>>>> my core. >>>>> >>>>> On Mon, Aug 24, 2015 at 7:10 PM, ?mer Sinan A?acan wrote: >>>>>> Mike, here's a piece of code that may be helpful to you: >>>>>> >>>>>> https://github.com/osa1/sc-plugin/blob/master/src/Supercompilation/Show.hs >>>>>> >>>>>> Copy this module to your plugin, it doesn't have any dependencies other than >>>>>> ghc itself. When your plugin is initialized, update `dynFlags_ref` with your >>>>>> DynFlags as first thing to do. Then use Show instance to print AST directly. >>>>>> >>>>>> Horrible hack, but very useful for learning purposes. In fact, I don't know how >>>>>> else we can learn what Core is generated for a given code, and reverse-engineer >>>>>> to figure out details. >>>>>> >>>>>> Hope it helps. >>>>>> >>>>>> 2015-08-24 21:59 GMT-04:00 ?mer Sinan A?acan : >>>>>>>> Lets say I'm running the plugin on a function with signature `Floating a => a >>>>>>>> -> a`, then the plugin has access to the `Floating` dictionary for the type. >>>>>>>> But if I want to add two numbers together, I need the `Num` dictionary. I >>>>>>>> know I should have access to `Num` since it's a superclass of `Floating`. >>>>>>>> How can I get access to these superclass dictionaries? >>>>>>> >>>>>>> I don't have a working code for this but this should get you started: >>>>>>> >>>>>>> let ord_dictionary :: Id = ... >>>>>>> ord_class :: Class = ... >>>>>>> in >>>>>>> mkApps (Var (head (classSCSels ord_class))) [Var ord_dictionary] >>>>>>> >>>>>>> I don't know how to get Class for Ord. I do `head` here because in the case of >>>>>>> Ord we only have one superclass so `classSCSels` should have one Id. Then I >>>>>>> apply ord_dictionary to this selector and it should return dictionary for Eq. >>>>>>> >>>>>>> I assumed you already have ord_dictionary, it should be passed to your function >>>>>>> already if you had `(Ord a) => ` in your function. >>>>>>> >>>>>>> >>>>>>> Now I realized you asked for getting Num from Floating. I think you should >>>>>>> follow a similar path except you need two applications, first to get Fractional >>>>>>> from Floating and second to get Num from Fractional: >>>>>>> >>>>>>> mkApps (Var (head (classSCSels fractional_class))) >>>>>>> [mkApps (Var (head (classSCSels floating_class))) >>>>>>> [Var floating_dictionary]] >>>>>>> >>>>>>> Return value should be a Num dictionary. >> _______________________________________________ >> ghc-devs mailing list >> ghc-devs at haskell.org >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From ezyang at mit.edu Sat Sep 5 07:06:26 2015 From: ezyang at mit.edu (Edward Z. Yang) Date: Sat, 05 Sep 2015 00:06:26 -0700 Subject: Unlifted data types In-Reply-To: References: <1441353701-sup-9422@sabre> <1441390306-sup-6240@sabre> <1441400654-sup-1647@sabre> Message-ID: <1441436053-sup-5590@sabre> Excerpts from Dan Doel's message of 2015-09-04 18:21:29 -0700: > Here are some additional thoughts. > > If we examine an analogue of some of your examples: > > data MutVar a = MV (MutVar# RealWorld a) > > main = do > let mv# = undefined > let mv = MV mv# > putStrLn "Okay." > > The above is illegal. Instead we _must_ write: This doesn't typecheck, but for a different reason: undefined :: a where a :: *, so you can't match up the kinds. error is actually extremely special in this case: it lives in OpenKind and matches both * and #. But let's suppose that we s/undefined/error "foo"/... > let !mv# = undefined > > which signals that evaluation is occurring. Also not true. Because error "foo" is inferred to have kind #, the bang pattern happens implicitly. > So it is impossible to > accidentally go from: > > main = do > let mv = MV undefined > putStrLn "Okay." > > which prints "Okay.", to something that throws an exception, without > having a pretty good indication that you're doing so. I would guess > this is desirable, so perhaps it should be mandated for Unlifted as > well. Nope, if you just float the error call out of MV, you will go from "Okay." to an exception. Notice that *data constructors* are what are used to induce suspension. This is why we don't have a 'suspend' special form; instead, 'Box' is used directly. > However, the above point confuses me with respect to another example. > The proposal says that: > > data Id :: * -> Unlifted where > Id :: a -> Id a > > could/should be compiled with no overhead over `a`, like a newtype. > However, if Unlifted things have operational semantics like #, what > does the following do: > > let x :: Id a > !x = Id undefined > > The ! should evaluate to the Id constructor, but we're not > representing it, so it actually doesn't evaluate anything? But: That's correct. Id is a box containing a lifted value. The box is unlifted, but the inner value can be lifted. > let x :: Id a > !x = undefined > > throws an exception? Yes, exactly. > Whereas for newtypes, both throw exceptions with > a !x definition, or don't with an x definition? Also correct. They key thing is to distinguish error in kind * and error in kind #. You can make a table: | Id (error "foo") | error "foo" | ---------------------+-----------------------+-------------------+ newtype Id :: * -> * | error "foo" :: * | error "foo" :: * | data Id :: * -> # | Id (error "foo" :: *) | error "foo" :: # | > Is it actually > possible to make Id behave this way without any representational > overhead? Yes. The reason is that an error "foo" :: # *immediately fails* (rather than attempt to allocate an Id). So the outer level of error doesn't ever need to be represented on the heap, so we can just represent the inner liftedness. Here's another way of looking at it: error in kind # is not a bottom at all. It's just a way of bailing immediately. HOWEVER... > I'm a little skeptical. I think that only Force (and Box) might be > able to have no representational overhead. It seems like it might be easier to explain if just Force and Box get optimized, and we don't bother with others; I only really care about those two operators being optimized. Edward From andrew.gibiansky at gmail.com Sat Sep 5 08:39:02 2015 From: andrew.gibiansky at gmail.com (Andrew Gibiansky) Date: Sat, 5 Sep 2015 01:39:02 -0700 Subject: Proposal: Argument Do Message-ID: Trac: https://ghc.haskell.org/trac/ghc/ticket/10843 I would like the following to be valid Haskell code: main = when True do putStrLn "Hello!" Instead of requiring a dollar sign before the "do". This would parse as main = when True (do putStrLn "Hello!") Has this been tried before? It seems fairly simple -- is there some complexity I'm missing? I've always been confused as to why the parser requires `$` there, and I've heard a lot of others ask about this as well. Perhaps we could fix that? PS. Regardless of whether this goes anywhere, it was fun to learn how to hack on GHC. It was surprisingly easy; I wrote up my experience here . The GHC wiki is outstanding; pretty much every intro question about ghc development I had was answered on a fairly easy-to-find wiki age. (Except for some stuff related to generating documentation and docbook, but whatever.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomasmiedema at gmail.com Sat Sep 5 09:32:41 2015 From: thomasmiedema at gmail.com (Thomas Miedema) Date: Sat, 5 Sep 2015 11:32:41 +0200 Subject: Proposal: Argument Do In-Reply-To: References: Message-ID: Hi Andrew, thank you for the write-up. There are some good hints in there for how to make the documentation better. If you had used `BuildFlavour = stage2`, as the Newcomers page suggests, you'd have had some less trouble. I'll go and edit the HowtomakeGHCbuildquickly section, because it is outdated. > From the Newcomers page, it?s not quite clear exactly how to make it only build Stage 2, even though it suggests doing so. The newcomers page says: - ## edit build.mk to remove the comment marker # on the line stage=2 - To speed up the development cycle, the final edit of build.mk makes sure that only the stage-2 compiler will be rebuild after this (see here about stages). Maybe you missed the comment about editing build.mk? Can you make suggestions how to make this more clear? I added some whitespace, but I'm not sure that's enough. Thanks, Thomas > > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomasmiedema at gmail.com Sat Sep 5 09:43:37 2015 From: thomasmiedema at gmail.com (Thomas Miedema) Date: Sat, 5 Sep 2015 11:43:37 +0200 Subject: Proposal: Argument Do In-Reply-To: References: Message-ID: > > If you had used `BuildFlavour = stage2` as the Newcomers page suggests, > you'd have had some less trouble. > That should say `BuildFlavour = devel2`. -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew.gibiansky at gmail.com Sat Sep 5 14:14:56 2015 From: andrew.gibiansky at gmail.com (Andrew Gibiansky) Date: Sat, 5 Sep 2015 07:14:56 -0700 Subject: Proposal: Argument Do In-Reply-To: References: Message-ID: Thomas, Thanks for cleaning stuff up on the Newcomers page and others. I think all the things that were somewhat confusing before are now much clearer and less vague. -- Andrew On Sat, Sep 5, 2015 at 2:43 AM, Thomas Miedema wrote: > If you had used `BuildFlavour = stage2` as the Newcomers page suggests, >> you'd have had some less trouble. >> > > That should say `BuildFlavour = devel2`. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dan.doel at gmail.com Sat Sep 5 17:35:44 2015 From: dan.doel at gmail.com (Dan Doel) Date: Sat, 5 Sep 2015 13:35:44 -0400 Subject: Unlifted data types In-Reply-To: <1441436053-sup-5590@sabre> References: <1441353701-sup-9422@sabre> <1441390306-sup-6240@sabre> <1441400654-sup-1647@sabre> <1441436053-sup-5590@sabre> Message-ID: On Sat, Sep 5, 2015 at 3:06 AM, Edward Z. Yang wrote: >> If we examine an analogue of some of your examples: >> >> data MutVar a = MV (MutVar# RealWorld a) >> >> main = do >> let mv# = undefined >> let mv = MV mv# >> putStrLn "Okay." >> >> The above is illegal. Instead we _must_ write: > > This doesn't typecheck, but for a different reason: undefined :: a > where a :: *, so you can't match up the kinds. > > error is actually extremely special in this case: it lives in OpenKind > and matches both * and #. But let's suppose that we > s/undefined/error "foo"/... > >> let !mv# = undefined >> >> which signals that evaluation is occurring. > > Also not true. Because error "foo" is inferred to have kind #, the bang > pattern happens implicitly. I tried with `error` first, and it worked exactly the way I described. But I guess it's a type inference weirdness. If I annotate mv# with MutVar# it will work, whereas otherwise it will be inferred that mv# :: a where a :: *, instead of #. Whereas !x is a pattern which requires monomorphism of x, and so it figures out mv# :: MutVar# .... Kind of an odd corner case where breaking cycles causes things _not_ to type check, due to open kinds not being first class. I thought I remembered that at some point it was decided that `let` bindings of unboxed things should be required to have bangs on the bindings, to indicate the evaluation order. Maybe I'm thinking of something else (was it that it was originally required and we got rid of it?). > Nope, if you just float the error call out of MV, you will go from > "Okay." to an exception. Notice that *data constructors* are what are > used to induce suspension. This is why we don't have a 'suspend' > special form; instead, 'Box' is used directly. I know that it's the floating that makes a difference, not the bang pattern. The point would be to make the syntax require the bang pattern to give a visual indication of when it happens, and make it illegal to look like you're doing a normal let that doesn't change the value (although having it actually be a bang pattern would be bad, because it'd restrict polymorphism of the definition). Also, the constructor isn't exactly relevant, so much as whether the unlifted error occurs inside the definition of a lifted thing. For instance, we can go from: let mv = MutVar undefined to: let mv = let mv# :: MutVar# RealWorld a ; mv# = undefined in MutVar mv# and the result is the same, because it is the definition of mv that is lazy. Constructors in complex expressions---and all subexpressions for that matter---just get compiled this way. E.G. let f :: MutVar# RealWorld a -> MutVar a f mv# = f mv# in flip const (f undefined) $ putStrLn "okay" No constructors involved, but no error. >> Is it actually >> possible to make Id behave this way without any representational >> overhead? > > Yes. The reason is that an error "foo" :: # *immediately fails* (rather > than attempt to allocate an Id). So the outer level of error doesn't > ever need to be represented on the heap, so we can just represent the > inner liftedness. Okay. So, there isn't representational overhead, but there is overhead, where you call a function or something (which will just return its argument), whereas newtype constructors end up not having any cost whatsoever? -- Dan From singpolyma at singpolyma.net Sat Sep 5 20:06:53 2015 From: singpolyma at singpolyma.net (Stephen Paul Weber) Date: Sat, 5 Sep 2015 20:06:53 +0000 Subject: more releases In-Reply-To: References: <3E39E8B5-89C2-40F6-9180-C6D73AF3926F@cis.upenn.edu> Message-ID: <20150905200653.GC7303@singpolyma.net> >having a large number of versions of GHC out there can make it difficult >for library authors, package curators, and large open source projects, due >to variety of what people are using. For point releases, if we do it right, this *should* not happen, since the changes *should* be backwards-compatible and so testing against the oldest release on the current major version *should* mean all subsequent point releases work as well. IMHO, any violation of this assumption *should* be considered a (serious) bug. From hvr at gnu.org Sun Sep 6 14:06:00 2015 From: hvr at gnu.org (Herbert Valerio Riedel) Date: Sun, 06 Sep 2015 16:06:00 +0200 Subject: Arcanist "lite" Haskell reimplementation (was: Proposal: accept pull requests on GitHub) In-Reply-To: (Thomas Miedema's message of "Thu, 3 Sep 2015 11:53:40 +0200") References: Message-ID: <87si6rprqv.fsf@gnu.org> On 2015-09-03 at 11:53:40 +0200, Thomas Miedema wrote: [...] > In my opinion it's is a waste of our time trying to improve `arc` (it is > 34000 lines of PHP btw + another 70000 LOC for libphutil), when `pull > requests` are an obvious alternative that most of the Haskell community > already uses. [...] I went ahead wasting some time and hacked up `arc-lite` for fun: https://github.com/haskell-infra/arc-lite It's currently at 407 Haskell SLOCs according to sloccount(1), and emulates the `arc` CLI as a drop-in replacement. As a proof-of-concept I've implemented the 3 simple operations - `arc install-certificate` - `arc list` - `arc call-conduit` If we wasted even more time, this could result in - Simplify installation of Arcanist for GHC contributors via Hacked (i.e. just `cabal install arc-lite`) - Implement a simple `arc diff`-like operation for submitting patches to Phabricator - Implement convenience operations tailored to GHC development - Teach arc-lite to behave more Git-idomatic - Make `arc-lite` automatically manage multi-commit code-reviews by splitting them up and submit them as multiple inter-dependendant code-revisions - ... Any comments? Cheers, hvr --8<---------------cut here---------------start------------->8--- arc-list - Arcanist "lite" (CLI tool for Phabricator) Usage: arc-lite [--verbose] [--conduit-token TOKEN] [--conduit-uri URI] COMMAND Available options: -h,--help Show this help text --verbose Whether to be verbose --conduit-token TOKEN Ignore configured credentials and use an explicit API token instead --conduit-uri URI Ignore configured Conduit URI and use an explicit one instead Available commands: list List your open Differential revisions call-conduit Perform raw Conduit method call install-certificate Installs Conduit credentials into your ~/.arcrc for the given install of Phabricator --8<---------------cut here---------------end--------------->8--- From dan.doel at gmail.com Sun Sep 6 20:56:35 2015 From: dan.doel at gmail.com (Dan Doel) Date: Sun, 6 Sep 2015 16:56:35 -0400 Subject: Unlifted data types In-Reply-To: References: <1441353701-sup-9422@sabre> <1441390306-sup-6240@sabre> <1441400654-sup-1647@sabre> <1441436053-sup-5590@sabre> Message-ID: On Sat, Sep 5, 2015 at 1:35 PM, Dan Doel wrote: > Also, the constructor isn't exactly relevant, so much as whether the > unlifted error occurs inside the definition of a lifted thing. So, in light of this, `Box` is not necessary to define `suspend`. We can simply write: suspend :: Force a -> a suspend (Force x) = x and the fact that `a` has kind * means that `suspend undefined` only throws an exception if you inspect it. `Box` as currently defined (not the previous GADT definition) is novel in that it allows you to suspend unlifted types that weren't derived from `Force`. And it would probably be useful to have coercions between `Box (Force a)` and `a`, and `Force (Box u)` and `u`. But (I think) it is not necessary for mediating between `Force a` and `a`. -- Dan From simonpj at microsoft.com Mon Sep 7 08:17:12 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Mon, 7 Sep 2015 08:17:12 +0000 Subject: Thanks to Reid and Thomas Message-ID: <4ae0e9fa716745f8b741e0a877ff6611@DB4PR30MB030.064d.mgd.msft.net> Thomas, Reid, As I get back from ICFP, I?d like to take the opportunity to thank you for huge amount of work that you two personally have put into GHC recently. Your interventions are always thoughtful, supportive, and on target. GHC is a huge project, and lots of people contribute to it. I am truly grateful to all of them. But you two have been particularly active in the last year and I wanted to say thank you. Onward and upward, Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From william.knop.nospam at gmail.com Mon Sep 7 09:31:19 2015 From: william.knop.nospam at gmail.com (William Knop) Date: Mon, 7 Sep 2015 05:31:19 -0400 Subject: Thanks to Reid and Thomas In-Reply-To: <4ae0e9fa716745f8b741e0a877ff6611@DB4PR30MB030.064d.mgd.msft.net> References: <4ae0e9fa716745f8b741e0a877ff6611@DB4PR30MB030.064d.mgd.msft.net> Message-ID: <0699E066-8024-43E3-8451-D0990B11EA4C@gmail.com> Onward and upward! Those who are dedicated to getting things done on a day to day basis-- you have done a great service for us all and I can't properly express my appreciation. Making GHC sensible to the the rest of us is so important. Those who presented have enlightened and excited. I especially look forward to the confluence of automated static complexity analysis and super compilation, as well as the ideas surrounding "levity" in dependent type theory. I idly wonder about how the ideas from homotopy type theory WRT cubical sets might fit in. Truly interesting stuff. Cheers and thank you for your hard work, Will > On Sep 7, 2015, at 4:17 AM, Simon Peyton Jones wrote: > > Thomas, Reid, > > As I get back from ICFP, I?d like to take the opportunity to thank you for huge amount of work that you two personally have put into GHC recently. Your interventions are always thoughtful, supportive, and on target. > > GHC is a huge project, and lots of people contribute to it. I am truly grateful to all of them. But you two have been particularly active in the last year and I wanted to say thank you. > > Onward and upward, > > Simon > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at joachim-breitner.de Mon Sep 7 11:02:02 2015 From: mail at joachim-breitner.de (Joachim Breitner) Date: Mon, 07 Sep 2015 13:02:02 +0200 Subject: RFC: Unpacking sum types In-Reply-To: References: Message-ID: <1441623722.1570.25.camel@joachim-breitner.de> Hi, Am Dienstag, den 01.09.2015, 10:23 -0700 schrieb Johan Tibell: > I have a draft design for unpacking sum types that I'd like some > feedback on. In particular feedback both on: > > * the writing and clarity of the proposal and > * the proposal itself. > > https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes The current proposed layout for a data D a = D a {-# UNPACK #-} !(Maybe a) would be [D?s pointer] [a] [tag (0 or 1)] [Just?s a] So the representation of D foo (Just bar) is [D_info] [&foo] [1] [&bar] and of D foo Nothing is [D_info] [&foo] [0] [&dummy] where dummy is something that makes the GC happy. But assuming this dummy object is something that is never a valid heap objects of its own, then this should be sufficient to distinguish the two cases, and we could actually have that the representation of D foo (Just bar) is [D_info] [&foo] [&bar] and of D foo Nothing is [D_info] [&foo] [&dummy] and an case analysis on D would compare the pointer in the third word with the well-known address of dummy to determine if we have Nothing or Just. This saves one word. If we generate a number of such static dummy objects, we can generalize this tag-field avoiding trick to other data types than Maybe. It seems that it is worth doing that if * the number of constructors is no more than the number of static dummy objects, and * there is one constructor which has more pointer fields than all other constructors. Also, this trick cannot be applied repeatedly: If we have data D = D {-# UNPACK #-} !(Maybe a) | D'Nothing data E = E {-# UNPACK #-} !(D a) then it cannot be applied when unpacking D into E. (Or maybe it can, but care has to be taken that D?s Nothing is represented by a different dummy object than Maybe?s Nothing.) Anyways, this is an optimization that can be implemented once unboxed sum type are finished and working reliably. Greetings, Joachim -- Joachim ?nomeata? Breitner mail at joachim-breitner.de ? http://www.joachim-breitner.de/ Jabber: nomeata at joachim-breitner.de ? GPG-Key: 0xF0FBF51F Debian Developer: nomeata at debian.org -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From simonpj at microsoft.com Mon Sep 7 11:56:07 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Mon, 7 Sep 2015 11:56:07 +0000 Subject: Thanks to Reid and Thomas In-Reply-To: <4ae0e9fa716745f8b741e0a877ff6611@DB4PR30MB030.064d.mgd.msft.net> References: <4ae0e9fa716745f8b741e0a877ff6611@DB4PR30MB030.064d.mgd.msft.net> Message-ID: <729f3b24078b4732b2a03e521755560f@DB4PR30MB030.064d.mgd.msft.net> PS: auto-complete failed me. I meant Reid Barton, not Reinhard Wilhelm, of course ?. Sorry Reid. Simon From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Simon Peyton Jones Sent: 07 September 2015 09:17 To: Thomas Miedema; Reinhard Wilhelm Cc: ghc-devs at haskell.org Subject: Thanks to Reid and Thomas Thomas, Reid, As I get back from ICFP, I?d like to take the opportunity to thank you for huge amount of work that you two personally have put into GHC recently. Your interventions are always thoughtful, supportive, and on target. GHC is a huge project, and lots of people contribute to it. I am truly grateful to all of them. But you two have been particularly active in the last year and I wanted to say thank you. Onward and upward, Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomasmiedema at gmail.com Mon Sep 7 12:33:54 2015 From: thomasmiedema at gmail.com (Thomas Miedema) Date: Mon, 7 Sep 2015 14:33:54 +0200 Subject: Thanks to Reid and Thomas In-Reply-To: <4ae0e9fa716745f8b741e0a877ff6611@DB4PR30MB030.064d.mgd.msft.net> References: <4ae0e9fa716745f8b741e0a877ff6611@DB4PR30MB030.064d.mgd.msft.net> Message-ID: On Mon, Sep 7, 2015 at 10:17 AM, Simon Peyton Jones wrote: > Thomas, Reid, > > > > As I get back from ICFP, I?d like to take the opportunity to thank you for > huge amount of work that you two personally have put into GHC recently. > Your interventions are always thoughtful, supportive, and on target. > > > GHC is a huge project, and lots of people contribute to it. I am truly > grateful to all of them. But you two have been particularly active in the > last year and I wanted to say thank you. > Thank you for the kind words. -------------- next part -------------- An HTML attachment was scrubbed... URL: From simonpj at microsoft.com Mon Sep 7 13:47:01 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Mon, 7 Sep 2015 13:47:01 +0000 Subject: Proposal: accept pull requests on GitHub In-Reply-To: References: <55E7453A.90309@gmail.com> <87mvx4mu2x.fsf@andromedae.feelingofgreen.ru> <55E76572.3050405@nh2.me> Message-ID: <1469c7be53ed4f0dab3872de9fe5ad54@DB4PR30MB030.064d.mgd.msft.net> I am very much at the ignorant end of this debate: I'll just use whatever I'm told to use. But I do resonate with this observation from Austin: | For one, having two code review tools of any form is completely | bonkers, TBQH. This is my biggest 'obvious' blocker. If we're going to | switch, we should just switch. Having to have people decide how to | contribute with two tools is as crazy as having two VCSs and just a | way of asking people to get *more* confused, and have us answer more | questions. That's something we need to avoid. As a code contributor and reviewer, this is awkward. As a contributor, how do I choose? As a reviewer I'm presumably forced to learn both tools. But I'll go with the flow... I do not have a well-informed opinion about the tradeoffs. (I'm tempted naively to ask: is there an automated way to go from a GitHub PR to a Phab ticket? Then we could convert the former (if someone wants to submit that way) into the latter.) Simon | -----Original Message----- | From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of | Austin Seipp | Sent: 03 September 2015 05:42 | To: Niklas Hamb?chen | Cc: Simon Marlow; ghc-devs at haskell.org | Subject: Re: Proposal: accept pull requests on GitHub | | (JFYI: I hate to announce my return with a giant novel of negative- | nancy-ness about a proposal that just came up. I'm sorry about this!) | | TL;DR: I'm strongly -1 on this, because I think it introduces a lot of | associated costs for everyone, the benefits aren't really clear, and I | think it obscures the real core issue about "how do we get more | contributors" and how to make that happen. Needless to say, GitHub | does not magically solve both of these AFAICS. | | As is probably already widely known, I'm fairly against GitHub because | I think at best its tools are mediocre and inappropriate for GHC - but | I also don't think this proposal or the alternatives stemming from it | are very good, and that it reduces visibility of the real, core | complaints about what is wrong. Some of those problems may be with | Phabricator, but it's hard to sort the wheat from the chaff, so to | speak. | | For one, having two code review tools of any form is completely | bonkers, TBQH. This is my biggest 'obvious' blocker. If we're going to | switch, we should just switch. Having to have people decide how to | contribute with two tools is as crazy as having two VCSs and just a | way of asking people to get *more* confused, and have us answer more | questions. That's something we need to avoid. | | For the same reason, I'm also not a fan of 'use third party thing to | augment other thing to remove its deficiencies making it OK', because | the problem is _it adds surface area_ and other problems in other | cases. It is a solution that should be considered a last resort, | because it is a logical solution that applies to everything. If we | have a bot that moves GH PRs into Phab and then review them there, the | surface area of what we have to maintain and explain has suddenly | exploded: because now instead of 1 thing we have 3 things (GH, Phab, | bot) and the 3 interactions between them, for a multiplier of *six* | things we have to deal with. And then we use reviewable,io, because GH | reviews are terrible, adding a 4th mechanism? It's rube goldberg-ian. | We can logically 'automate' everything in all ways to make all | contributors happy, but there's a real *cognitive* overhead to this | and humans don't scale as well as computers do. It is not truly | 'automated away' if the cognitive burden is still there. | | I also find it extremely strange to tell people "By the way, this | method in which you've contributed, as was requested by community | members, is actually a complete proxy for the real method of | contributing, you can find all your imported code here". How is this | supposed to make contribution *easier* as opposed to just more | confusing? Now you've got the impression you're using "the real thing" | when in reality it's shoved off somewhere else to have the nitpicking | done. Just using Phabricator would be less complicated, IMO, and much | more direct. | | The same thing goes for reviewable.io. Adding it as a layer over | GitHub just makes the surface area larger, and puts less under our | control. And is it going to exist in the same form in 2 or 3 years? | Will it continue to offer the same tools, the same workflows that we | "like", and what happens when we hit a wall? It's easy to say | "probably" or "sure" to all this, until we hit something we dislike | and have no possibility of fixing. | | And once you do all this, BTW, you can 'never go back'. It seems so | easy to just say 'submit pull requests' once and nothing else, right? | Wrong. Once you commit to that infrastructure, it is *there* and | simply taking it out from under the feet of those using it is not only | unfortunate, it is *a huge timesink to undo it all*. Which amounts to | it never happening. Oh, but you can import everything elsewhere! The | problem is you *can't* import everything, but more importantly you | can't *import my memories in another way*, so it's a huge blow to | contributors to ask them about these mental time sinks, then to forget | them all. And as your project grows, this becomes more of a memory as | you made a first and last choice to begin with. | | Phabricator was 'lucky' here because it had the gateway into being the | first review tool for us. But that wasn't because it was *better* than | GitHub. It was because we were already using it, and it did not | interact badly with our other tools or force us to compromise things - | so the *cost* was low. The cost is immeasurably higher by default | against GitHub because of this, at least to me. That's just how it is | sometimes. | | Keep in mind there is a cost to everything and how you fix it. GitHub | is not a simple patch to add a GHC feature. It is a question that | fundamentally concerns itself with the future of the project for a | long time. The costs must be analyzed more aggressively. Again, | Phabricator had 'first child' preferential treatment. That's not | something we can undo now. | | I know this sounds like a lot of ad hoc mumbo jumbo, but please bear | with me: we need to identify the *root issue* here to fix it. | Otherwise we will pay for the costs of an improper fix for a long | time, and we are going to keep having this conversation over, and over | again. And we need to weigh in the cost of fixing it, which is why I | mention that so much. | | So with all this in mind, you're back to just using GitHub. But again | GitHub is quite mediocre at best. So what is the point of all this? | It's hinted at here: | | > the number of contributions will go up, commits will be smaller, and | there will be more of them per pull request (contributors will be able | to put style changes and refactorings into separate commits, without | jumping through a bunch of hoops). | | The real hint is that "the number of contributions will go up". That's | a noble goal and I think it's at the heart of this proposal. | | Here's the meat of it question: what is the cost of achieving this | goal? That is, what amount of work is sufficient to make this goal | realizable, and finally - why is GitHub *the best use of our time for | achieving this?* That's one aspect of the cost - that it's the best | use of the time. I feel like this is fundamentally why I always seem | to never 'get' this argument, and I'm sure it's very frustrating on | behalf of the people who have talked to me about it and like GitHub. | But I feel like I've never gotten a straight answer for GHC. | | If the goal is actually "make more people contribute", that's pretty | broad. I can make that very easy: give everyone who ever submits a | patch push access. This is a legitimate way to run large projects that | has worked. People will almost certainly be more willing to commit, | especially when overhead on patch submission is reduced so much. Why | not just do that instead? It's not like we even mandate code review, | although we could. You could reasonably trust CI to catch and revert | things a lot of the time for people who commit directly to master. We | all do it sometimes. | | I'm being serious about this. I can start doing that tomorrow because | the *cost is low*, both now and reasonably speaking into some | foreseeable future. It is one of many solutions to raw heart of the | proposal. GitHub is not a low cost move, but also, it is a *long term | cost* because of the technical deficiencies it won't aim to address | (merge commits are ugly, branch reviews are weak, ticket/PR namespace | overlaps with Trac, etc etc) or that we'll have to work around. | | That means that if we want GitHub to fix the "give us more | contributors" problem, and it has a high cost, it not only has _to fix | the problem_, it also has to do that well enough to offset its cost. I | don't think it's clear that is the case right now, among a lot of | other solutions. | | I don't think the root issue is "We _need_ GitHub to get more | contributors". It sounds like the complaint is more "I don't like how | Phabricator works right now". That's an important distinction, because | the latter is not only more specific, it's more actionable: | | - Things like Arcanist can be tracked as a Git submodule. There is | little to no pain in this, it's low cost, and it can always be | synchronized with Phabricator. This eliminates the "Must clone | arcanist" and "need to upgrade arcanist" points. | | - Similarly when Phabricator sometimes kills a lot of builds, it's | because I do an upgrade. That's mostly an error on my part and I can | simply schedule upgrades regularly, barring hotfixes or somesuch. That | should basically eliminate these. The other build issues are from | picking the wrong base commit from the revision, I think, which I | believe should be fixable upstream (I need to get a solid example of | one that isn't a mega ultra patch.) | | - If Harbormaster is not building dependent patches as mentioned in | WhyNotPhabricator, that is a bug, and I have not been aware of it. | Please make me aware of it so I can file bugs! I seriously don't look | at _every_ patch, I need to know this. That could have probably been | fixed ASAP otherwise. | | - We can get rid of the awkwardness of squashes etc by using | Phabricator's "immutable" history, although it introduces merge | commits. Whether this is acceptable is up to debate (I dislike merge | commits, but could live with it). | | - I do not understand point #3, about answering questions. Here's | the reality: every single one of those cases is *almost always an | error*. That's not a joke. Forgetting to commit a file, amending | changes in the working tree, and specifying a reviewer are all total | errors as it stands today. Why is this a minus? It catches a useful | class of 'interaction bugs'. If it's because sometimes Phabricator | yells about build arifacts in the tree, those should be .gitignore'd. | If it's because you have to 'git stash' sometimes, this is fairly | trivial IMO. Finally, specifying reviewers IS inconvenient, but | currently needed. We could easily assign a '#reviewers' tag that would | add default reviewers. | - In the future, Phabricator will hopefully be able to | automatically assign the right reviewers to every single incoming | patch, based on the source file paths in the tree, using the Owners | tool. Technically, we could do that today if we wanted, it's just a | little more effort to add more Herald rules. This will be far, far | more robust than anything GitHub can offer, and eliminates point #3. | | - Styling, linting etc errors being included, because reviews are | hard to create: This is tangential IMO. We need to just bite the | bullet on this and settle on some lint and coding styles, and apply | them to the tree uniformly. The reality is *nobody ever does style | changes on their own*, and they are always accompanied by a diff, and | they always have to redo the work of pulling them out, Phab or not. | Literally 99% of the time we ask for this, it happens this way. | Perhaps instead we should just eliminate this class of work by just | running linters over all of the source code at once, and being happy | with it. | | Doing this in fact has other benefits: like `arc lint` will always | _correctly_ report when linting errors are violated. And we can reject | patches that violate them, because they will always be accurate. | | - As for some of the quotes, some of them are funny, but the real | message lies in the context. :) In particular, there have been several | cases (such as the DWARF work) where the idea was "write 30 commits | and put them on Phabricator". News flash: *this is bad*, no matter | whether you're using Phabricator or not, because it makes reviewing | the whole thing immensely difficult from a reviewer perspective. The | point here is that we can clear this up by being more communicative | about what we expect of authors of large patches, and communicating | your intent ASAP so we can get patches in as fast as possible. Writing | a patch is the easiest part of the work. | | And more: | | - Clean up the documentation, it's a mess. It feels nice that | everything has clear, lucid explanations on the wiki, but the wiki is | ridiculously massive and we have a tendancy for 'link creep' where we | spread things out. The contributors docs could probably stand to be | streamlined. We would have to do this anyway, moving to GitHub or not. | | - Improve the homepage, directly linking to this aforementioned | page. | | - Make it clear what we expect of contributors. I feel like a lot of | this could be explained by having a 5 minute drive-by guide for | patches, and then a longer 10-minute guide about A) How to style | things, B) How to format your patches if you're going to contribute | regularly, C) Why it is this way, and D) finally links to all the | other things you need to know. People going into Phabricator expecting | it to behave like GitHub is a problem (more a cultural problem IMO but | that's another story), and if this can't be directly fixed, the best | thing to do is make it clear why it isn't. | | Those are just some of the things OTTOMH, but this email is already | way too long. This is what I mean though: fixing most of these is | going to have *seriously smaller cost* than moving to GitHub. It does | not account for "The GitHub factor" of people contributing "just | because it's on GitHub", but again, that value has to outweigh the | other costs. I'm not seriously convinced it does. | | I know it's work to fix these things. But GitHub doesn't really | magically make a lot of our needs go away, and it's not going to | magically fix things like style or lint errors, the fact Travis-CI is | still pretty insufficient for us in the long term (and Harbormaster is | faster, on our own hardware, too), or that it will cause needlessly | higher amounts of spam through Trac and GitHub itself. I don't think | settling on it as - what seems to be - a first resort, is a really | good idea. | | | On Wed, Sep 2, 2015 at 4:09 PM, Niklas Hamb?chen wrote: | > On 02/09/15 22:42, Kosyrev Serge wrote: | >> As a wild idea -- did anyone look at /Gitlab/ instead? | > | > Hi, yes. It does not currently have a sufficient review | functionality | > (cannot handle multiple revisions easily). | > | > On 02/09/15 20:51, Simon Marlow wrote: | >> It might feel better | >> for the author, but discovering what changed between two branches | of | >> multiple commits on github is almost impossible. | > | > I disagree with the first part of this: When the UI of the review | tool | > is good, it is easy to follow. But there's no open-source | > implementation of that around. | > | > I agree that it is not easy to follow on Github. | > _______________________________________________ | > ghc-devs mailing list | > ghc-devs at haskell.org | > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs | > | | | | -- | Regards, | | Austin Seipp, Haskell Consultant | Well-Typed LLP, http://www.well-typed.com/ | _______________________________________________ | ghc-devs mailing list | ghc-devs at haskell.org | http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From simonpj at microsoft.com Mon Sep 7 13:59:54 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Mon, 7 Sep 2015 13:59:54 +0000 Subject: ArrayArrays In-Reply-To: References: <4DACFC45-0E7E-4B3F-8435-5365EC3F7749@cse.unsw.edu.au> <65158505c7be41afad85374d246b7350@DB4PR30MB030.064d.mgd.msft.net> <2FCB6298-A4FF-4F7B-8BF8-4880BB3154AB@gmail.com> Message-ID: <325b043066bb48a79f254b75ba9753ee@DB4PR30MB030.064d.mgd.msft.net> It was fun to meet and discuss this. Did someone volunteer to write a wiki page that describes the proposed design? And, I earnestly hope, also describes the menagerie of currently available array types and primops so that users can have some chance of picking the right one?! Thanks Simon From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Ryan Newton Sent: 31 August 2015 23:11 To: Edward Kmett; Johan Tibell Cc: Simon Marlow; Manuel M T Chakravarty; Chao-Hong Chen; ghc-devs; Ryan Scott; Ryan Yates Subject: Re: ArrayArrays Dear Edward, Ryan Yates, and other interested parties -- So when should we meet up about this? May I propose the Tues afternoon break for everyone at ICFP who is interested in this topic? We can meet out in the coffee area and congregate around Edward Kmett, who is tall and should be easy to find ;-). I think Ryan is going to show us how to use his new primops for combined array + other fields in one heap object? On Sat, Aug 29, 2015 at 9:24 PM Edward Kmett > wrote: Without a custom primitive it doesn't help much there, you have to store the indirection to the mask. With a custom primitive it should cut the on heap root-to-leaf path of everything in the HAMT in half. A shorter HashMap was actually one of the motivating factors for me doing this. It is rather astoundingly difficult to beat the performance of HashMap, so I had to start cheating pretty badly. ;) -Edward On Sat, Aug 29, 2015 at 5:45 PM, Johan Tibell > wrote: I'd also be interested to chat at ICFP to see if I can use this for my HAMT implementation. On Sat, Aug 29, 2015 at 3:07 PM, Edward Kmett > wrote: Sounds good to me. Right now I'm just hacking up composable accessors for "typed slots" in a fairly lens-like fashion, and treating the set of slots I define and the 'new' function I build for the data type as its API, and build atop that. This could eventually graduate to template-haskell, but I'm not entirely satisfied with the solution I have. I currently distinguish between what I'm calling "slots" (things that point directly to another SmallMutableArrayArray# sans wrapper) and "fields" which point directly to the usual Haskell data types because unifying the two notions meant that I couldn't lift some coercions out "far enough" to make them vanish. I'll be happy to run through my current working set of issues in person and -- as things get nailed down further -- in a longer lived medium than in personal conversations. ;) -Edward On Sat, Aug 29, 2015 at 7:59 AM, Ryan Newton > wrote: I'd also love to meet up at ICFP and discuss this. I think the array primops plus a TH layer that lets (ab)use them many times without too much marginal cost sounds great. And I'd like to learn how we could be either early users of, or help with, this infrastructure. CC'ing in Ryan Scot and Omer Agacan who may also be interested in dropping in on such discussions @ICFP, and Chao-Hong Chen, a Ph.D. student who is currently working on concurrent data structures in Haskell, but will not be at ICFP. On Fri, Aug 28, 2015 at 7:47 PM, Ryan Yates > wrote: I completely agree. I would love to spend some time during ICFP and friends talking about what it could look like. My small array for STM changes for the RTS can be seen here [1]. It is on a branch somewhere between 7.8 and 7.10 and includes irrelevant STM bits and some confusing naming choices (sorry), but should cover all the details needed to implement it for a non-STM context. The biggest surprise for me was following small array too closely and having a word/byte offset miss-match [2]. [1]: https://github.com/fryguybob/ghc/compare/ghc-htm-bloom...fryguybob:ghc-htm-mut [2]: https://ghc.haskell.org/trac/ghc/ticket/10413 Ryan On Fri, Aug 28, 2015 at 10:09 PM, Edward Kmett > wrote: > I'd love to have that last 10%, but its a lot of work to get there and more > importantly I don't know quite what it should look like. > > On the other hand, I do have a pretty good idea of how the primitives above > could be banged out and tested in a long evening, well in time for 7.12. And > as noted earlier, those remain useful even if a nicer typed version with an > extra level of indirection to the sizes is built up after. > > The rest sounds like a good graduate student project for someone who has > graduate students lying around. Maybe somebody at Indiana University who has > an interest in type theory and parallelism can find us one. =) > > -Edward > > On Fri, Aug 28, 2015 at 8:48 PM, Ryan Yates > wrote: >> >> I think from my perspective, the motivation for getting the type >> checker involved is primarily bringing this to the level where users >> could be expected to build these structures. it is reasonable to >> think that there are people who want to use STM (a context with >> mutation already) to implement a straight forward data structure that >> avoids extra indirection penalty. There should be some places where >> knowing that things are field accesses rather then array indexing >> could be helpful, but I think GHC is good right now about handling >> constant offsets. In my code I don't do any bounds checking as I know >> I will only be accessing my arrays with constant indexes. I make >> wrappers for each field access and leave all the unsafe stuff in >> there. When things go wrong though, the compiler is no help. Maybe >> template Haskell that generates the appropriate wrappers is the right >> direction to go. >> There is another benefit for me when working with these as arrays in >> that it is quite simple and direct (given the hoops already jumped >> through) to play with alignment. I can ensure two pointers are never >> on the same cache-line by just spacing things out in the array. >> >> On Fri, Aug 28, 2015 at 7:33 PM, Edward Kmett > wrote: >> > They just segfault at this level. ;) >> > >> > Sent from my iPhone >> > >> > On Aug 28, 2015, at 7:25 PM, Ryan Newton > wrote: >> > >> > You presumably also save a bounds check on reads by hard-coding the >> > sizes? >> > >> > On Fri, Aug 28, 2015 at 3:39 PM, Edward Kmett > wrote: >> >> >> >> Also there are 4 different "things" here, basically depending on two >> >> independent questions: >> >> >> >> a.) if you want to shove the sizes into the info table, and >> >> b.) if you want cardmarking. >> >> >> >> Versions with/without cardmarking for different sizes can be done >> >> pretty >> >> easily, but as noted, the infotable variants are pretty invasive. >> >> >> >> -Edward >> >> >> >> On Fri, Aug 28, 2015 at 6:36 PM, Edward Kmett > wrote: >> >>> >> >>> Well, on the plus side you'd save 16 bytes per object, which adds up >> >>> if >> >>> they were small enough and there are enough of them. You get a bit >> >>> better >> >>> locality of reference in terms of what fits in the first cache line of >> >>> them. >> >>> >> >>> -Edward >> >>> >> >>> On Fri, Aug 28, 2015 at 6:14 PM, Ryan Newton > >> >>> wrote: >> >>>> >> >>>> Yes. And for the short term I can imagine places we will settle with >> >>>> arrays even if it means tracking lengths unnecessarily and >> >>>> unsafeCoercing >> >>>> pointers whose types don't actually match their siblings. >> >>>> >> >>>> Is there anything to recommend the hacks mentioned for fixed sized >> >>>> array >> >>>> objects *other* than using them to fake structs? (Much to >> >>>> derecommend, as >> >>>> you mentioned!) >> >>>> >> >>>> On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett > >> >>>> wrote: >> >>>>> >> >>>>> I think both are useful, but the one you suggest requires a lot more >> >>>>> plumbing and doesn't subsume all of the usecases of the other. >> >>>>> >> >>>>> -Edward >> >>>>> >> >>>>> On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton > >> >>>>> wrote: >> >>>>>> >> >>>>>> So that primitive is an array like thing (Same pointed type, >> >>>>>> unbounded >> >>>>>> length) with extra payload. >> >>>>>> >> >>>>>> I can see how we can do without structs if we have arrays, >> >>>>>> especially >> >>>>>> with the extra payload at front. But wouldn't the general solution >> >>>>>> for >> >>>>>> structs be one that that allows new user data type defs for # >> >>>>>> types? >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett > >> >>>>>> wrote: >> >>>>>>> >> >>>>>>> Some form of MutableStruct# with a known number of words and a >> >>>>>>> known >> >>>>>>> number of pointers is basically what Ryan Yates was suggesting >> >>>>>>> above, but >> >>>>>>> where the word counts were stored in the objects themselves. >> >>>>>>> >> >>>>>>> Given that it'd have a couple of words for those counts it'd >> >>>>>>> likely >> >>>>>>> want to be something we build in addition to MutVar# rather than a >> >>>>>>> replacement. >> >>>>>>> >> >>>>>>> On the other hand, if we had to fix those numbers and build info >> >>>>>>> tables that knew them, and typechecker support, for instance, it'd >> >>>>>>> get >> >>>>>>> rather invasive. >> >>>>>>> >> >>>>>>> Also, a number of things that we can do with the 'sized' versions >> >>>>>>> above, like working with evil unsized c-style arrays directly >> >>>>>>> inline at the >> >>>>>>> end of the structure cease to be possible, so it isn't even a pure >> >>>>>>> win if we >> >>>>>>> did the engineering effort. >> >>>>>>> >> >>>>>>> I think 90% of the needs I have are covered just by adding the one >> >>>>>>> primitive. The last 10% gets pretty invasive. >> >>>>>>> >> >>>>>>> -Edward >> >>>>>>> >> >>>>>>> On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton > >> >>>>>>> wrote: >> >>>>>>>> >> >>>>>>>> I like the possibility of a general solution for mutable structs >> >>>>>>>> (like Ed said), and I'm trying to fully understand why it's hard. >> >>>>>>>> >> >>>>>>>> So, we can't unpack MutVar into constructors because of object >> >>>>>>>> identity problems. But what about directly supporting an >> >>>>>>>> extensible set of >> >>>>>>>> unlifted MutStruct# objects, generalizing (and even replacing) >> >>>>>>>> MutVar#? That >> >>>>>>>> may be too much work, but is it problematic otherwise? >> >>>>>>>> >> >>>>>>>> Needless to say, this is also critical if we ever want best in >> >>>>>>>> class >> >>>>>>>> lockfree mutable structures, just like their Stm and sequential >> >>>>>>>> counterparts. >> >>>>>>>> >> >>>>>>>> On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones >> >>>>>>>> > wrote: >> >>>>>>>>> >> >>>>>>>>> At the very least I'll take this email and turn it into a short >> >>>>>>>>> article. >> >>>>>>>>> >> >>>>>>>>> Yes, please do make it into a wiki page on the GHC Trac, and >> >>>>>>>>> maybe >> >>>>>>>>> make a ticket for it. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Thanks >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Simon >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> From: Edward Kmett [mailto:ekmett at gmail.com] >> >>>>>>>>> Sent: 27 August 2015 16:54 >> >>>>>>>>> To: Simon Peyton Jones >> >>>>>>>>> Cc: Manuel M T Chakravarty; Simon Marlow; ghc-devs >> >>>>>>>>> Subject: Re: ArrayArrays >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> An ArrayArray# is just an Array# with a modified invariant. It >> >>>>>>>>> points directly to other unlifted ArrayArray#'s or ByteArray#'s. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> While those live in #, they are garbage collected objects, so >> >>>>>>>>> this >> >>>>>>>>> all lives on the heap. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> They were added to make some of the DPH stuff fast when it has >> >>>>>>>>> to >> >>>>>>>>> deal with nested arrays. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I'm currently abusing them as a placeholder for a better thing. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> The Problem >> >>>>>>>>> >> >>>>>>>>> ----------------- >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Consider the scenario where you write a classic doubly-linked >> >>>>>>>>> list >> >>>>>>>>> in Haskell. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Chasing from one DLL to the next requires following 3 pointers >> >>>>>>>>> on >> >>>>>>>>> the heap. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> DLL ~> IORef (Maybe DLL) ~> MutVar# RealWorld (Maybe DLL) ~> >> >>>>>>>>> Maybe >> >>>>>>>>> DLL ~> DLL >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> That is 3 levels of indirection. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> We can trim one by simply unpacking the IORef with >> >>>>>>>>> -funbox-strict-fields or UNPACK >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> We can trim another by adding a 'Nil' constructor for DLL and >> >>>>>>>>> worsening our representation. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> but now we're still stuck with a level of indirection >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> DLL ~> MutVar# RealWorld DLL ~> DLL >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> This means that every operation we perform on this structure >> >>>>>>>>> will >> >>>>>>>>> be about half of the speed of an implementation in most other >> >>>>>>>>> languages >> >>>>>>>>> assuming we're memory bound on loading things into cache! >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Making Progress >> >>>>>>>>> >> >>>>>>>>> ---------------------- >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I have been working on a number of data structures where the >> >>>>>>>>> indirection of going from something in * out to an object in # >> >>>>>>>>> which >> >>>>>>>>> contains the real pointer to my target and coming back >> >>>>>>>>> effectively doubles >> >>>>>>>>> my runtime. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> We go out to the MutVar# because we are allowed to put the >> >>>>>>>>> MutVar# >> >>>>>>>>> onto the mutable list when we dirty it. There is a well defined >> >>>>>>>>> write-barrier. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I could change out the representation to use >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data DLL = DLL (MutableArray# RealWorld DLL) | Nil >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I can just store two pointers in the MutableArray# every time, >> >>>>>>>>> but >> >>>>>>>>> this doesn't help _much_ directly. It has reduced the amount of >> >>>>>>>>> distinct >> >>>>>>>>> addresses in memory I touch on a walk of the DLL from 3 per >> >>>>>>>>> object to 2. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I still have to go out to the heap from my DLL and get to the >> >>>>>>>>> array >> >>>>>>>>> object and then chase it to the next DLL and chase that to the >> >>>>>>>>> next array. I >> >>>>>>>>> do get my two pointers together in memory though. I'm paying for >> >>>>>>>>> a card >> >>>>>>>>> marking table as well, which I don't particularly need with just >> >>>>>>>>> two >> >>>>>>>>> pointers, but we can shed that with the "SmallMutableArray#" >> >>>>>>>>> machinery added >> >>>>>>>>> back in 7.10, which is just the old array code a a new data >> >>>>>>>>> type, which can >> >>>>>>>>> speed things up a bit when you don't have very big arrays: >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> But what if I wanted my object itself to live in # and have two >> >>>>>>>>> mutable fields and be able to share the sme write barrier? >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> An ArrayArray# points directly to other unlifted array types. >> >>>>>>>>> What >> >>>>>>>>> if we have one # -> * wrapper on the outside to deal with the >> >>>>>>>>> impedence >> >>>>>>>>> mismatch between the imperative world and Haskell, and then just >> >>>>>>>>> let the >> >>>>>>>>> ArrayArray#'s hold other arrayarrays. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data DLL = DLL (MutableArrayArray# RealWorld) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> now I need to make up a new Nil, which I can just make be a >> >>>>>>>>> special >> >>>>>>>>> MutableArrayArray# I allocate on program startup. I can even >> >>>>>>>>> abuse pattern >> >>>>>>>>> synonyms. Alternately I can exploit the internals further to >> >>>>>>>>> make this >> >>>>>>>>> cheaper. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Then I can use the readMutableArrayArray# and >> >>>>>>>>> writeMutableArrayArray# calls to directly access the preceding >> >>>>>>>>> and next >> >>>>>>>>> entry in the linked list. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> So now we have one DLL wrapper which just 'bootstraps me' into a >> >>>>>>>>> strict world, and everything there lives in #. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> next :: DLL -> IO DLL >> >>>>>>>>> >> >>>>>>>>> next (DLL m) = IO $ \s -> case readMutableArrayArray# s of >> >>>>>>>>> >> >>>>>>>>> (# s', n #) -> (# s', DLL n #) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> It turns out GHC is quite happy to optimize all of that code to >> >>>>>>>>> keep things unboxed. The 'DLL' wrappers get removed pretty >> >>>>>>>>> easily when they >> >>>>>>>>> are known strict and you chain operations of this sort! >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Cleaning it Up >> >>>>>>>>> >> >>>>>>>>> ------------------ >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Now I have one outermost indirection pointing to an array that >> >>>>>>>>> points directly to other arrays. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I'm stuck paying for a card marking table per object, but I can >> >>>>>>>>> fix >> >>>>>>>>> that by duplicating the code for MutableArrayArray# and using a >> >>>>>>>>> SmallMutableArray#. I can hack up primops that let me store a >> >>>>>>>>> mixture of >> >>>>>>>>> SmallMutableArray# fields and normal ones in the data structure. >> >>>>>>>>> Operationally, I can even do so by just unsafeCoercing the >> >>>>>>>>> existing >> >>>>>>>>> SmallMutableArray# primitives to change the kind of one of the >> >>>>>>>>> arguments it >> >>>>>>>>> takes. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> This is almost ideal, but not quite. I often have fields that >> >>>>>>>>> would >> >>>>>>>>> be best left unboxed. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data DLLInt = DLL !Int !(IORef DLL) !(IORef DLL) | Nil >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> was able to unpack the Int, but we lost that. We can currently >> >>>>>>>>> at >> >>>>>>>>> best point one of the entries of the SmallMutableArray# at a >> >>>>>>>>> boxed or at a >> >>>>>>>>> MutableByteArray# for all of our misc. data and shove the int in >> >>>>>>>>> question in >> >>>>>>>>> there. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> e.g. if I were to implement a hash-array-mapped-trie I need to >> >>>>>>>>> store masks and administrivia as I walk down the tree. Having to >> >>>>>>>>> go off to >> >>>>>>>>> the side costs me the entire win from avoiding the first pointer >> >>>>>>>>> chase. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> But, if like Ryan suggested, we had a heap object we could >> >>>>>>>>> construct that had n words with unsafe access and m pointers to >> >>>>>>>>> other heap >> >>>>>>>>> objects, one that could put itself on the mutable list when any >> >>>>>>>>> of those >> >>>>>>>>> pointers changed then I could shed this last factor of two in >> >>>>>>>>> all >> >>>>>>>>> circumstances. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Prototype >> >>>>>>>>> >> >>>>>>>>> ------------- >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Over the last few days I've put together a small prototype >> >>>>>>>>> implementation with a few non-trivial imperative data structures >> >>>>>>>>> for things >> >>>>>>>>> like Tarjan's link-cut trees, the list labeling problem and >> >>>>>>>>> order-maintenance. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> https://github.com/ekmett/structs >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Notable bits: >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Data.Struct.Internal.LinkCut provides an implementation of >> >>>>>>>>> link-cut >> >>>>>>>>> trees in this style. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Data.Struct.Internal provides the rather horrifying guts that >> >>>>>>>>> make >> >>>>>>>>> it go fast. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Once compiled with -O or -O2, if you look at the core, almost >> >>>>>>>>> all >> >>>>>>>>> the references to the LinkCut or Object data constructor get >> >>>>>>>>> optimized away, >> >>>>>>>>> and we're left with beautiful strict code directly mutating out >> >>>>>>>>> underlying >> >>>>>>>>> representation. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> At the very least I'll take this email and turn it into a short >> >>>>>>>>> article. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> -Edward >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On Thu, Aug 27, 2015 at 9:00 AM, Simon Peyton Jones >> >>>>>>>>> > wrote: >> >>>>>>>>> >> >>>>>>>>> Just to say that I have no idea what is going on in this thread. >> >>>>>>>>> What is ArrayArray? What is the issue in general? Is there a >> >>>>>>>>> ticket? Is >> >>>>>>>>> there a wiki page? >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> If it?s important, an ab-initio wiki page + ticket would be a >> >>>>>>>>> good >> >>>>>>>>> thing. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Simon >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf >> >>>>>>>>> Of >> >>>>>>>>> Edward Kmett >> >>>>>>>>> Sent: 21 August 2015 05:25 >> >>>>>>>>> To: Manuel M T Chakravarty >> >>>>>>>>> Cc: Simon Marlow; ghc-devs >> >>>>>>>>> Subject: Re: ArrayArrays >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> When (ab)using them for this purpose, SmallArrayArray's would be >> >>>>>>>>> very handy as well. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Consider right now if I have something like an order-maintenance >> >>>>>>>>> structure I have: >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data Upper s = Upper {-# UNPACK #-} !(MutableByteArray s) {-# >> >>>>>>>>> UNPACK #-} !(MutVar s (Upper s)) {-# UNPACK #-} !(MutVar s >> >>>>>>>>> (Upper s)) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data Lower s = Lower {-# UNPACK #-} !(MutVar s (Upper s)) {-# >> >>>>>>>>> UNPACK #-} !(MutableByteArray s) {-# UNPACK #-} !(MutVar s >> >>>>>>>>> (Lower s)) {-# >> >>>>>>>>> UNPACK #-} !(MutVar s (Lower s)) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> The former contains, logically, a mutable integer and two >> >>>>>>>>> pointers, >> >>>>>>>>> one for forward and one for backwards. The latter is basically >> >>>>>>>>> the same >> >>>>>>>>> thing with a mutable reference up pointing at the structure >> >>>>>>>>> above. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On the heap this is an object that points to a structure for the >> >>>>>>>>> bytearray, and points to another structure for each mutvar which >> >>>>>>>>> each point >> >>>>>>>>> to the other 'Upper' structure. So there is a level of >> >>>>>>>>> indirection smeared >> >>>>>>>>> over everything. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> So this is a pair of doubly linked lists with an upward link >> >>>>>>>>> from >> >>>>>>>>> the structure below to the structure above. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Converted into ArrayArray#s I'd get >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data Upper s = Upper (MutableArrayArray# s) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> w/ the first slot being a pointer to a MutableByteArray#, and >> >>>>>>>>> the >> >>>>>>>>> next 2 slots pointing to the previous and next previous objects, >> >>>>>>>>> represented >> >>>>>>>>> just as their MutableArrayArray#s. I can use >> >>>>>>>>> sameMutableArrayArray# on these >> >>>>>>>>> for object identity, which lets me check for the ends of the >> >>>>>>>>> lists by tying >> >>>>>>>>> things back on themselves. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> and below that >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data Lower s = Lower (MutableArrayArray# s) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> is similar, with an extra MutableArrayArray slot pointing up to >> >>>>>>>>> an >> >>>>>>>>> upper structure. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I can then write a handful of combinators for getting out the >> >>>>>>>>> slots >> >>>>>>>>> in question, while it has gained a level of indirection between >> >>>>>>>>> the wrapper >> >>>>>>>>> to put it in * and the MutableArrayArray# s in #, that one can >> >>>>>>>>> be basically >> >>>>>>>>> erased by ghc. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Unlike before I don't have several separate objects on the heap >> >>>>>>>>> for >> >>>>>>>>> each thing. I only have 2 now. The MutableArrayArray# for the >> >>>>>>>>> object itself, >> >>>>>>>>> and the MutableByteArray# that it references to carry around the >> >>>>>>>>> mutable >> >>>>>>>>> int. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> The only pain points are >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> 1.) the aforementioned limitation that currently prevents me >> >>>>>>>>> from >> >>>>>>>>> stuffing normal boxed data through a SmallArray or Array into an >> >>>>>>>>> ArrayArray >> >>>>>>>>> leaving me in a little ghetto disconnected from the rest of >> >>>>>>>>> Haskell, >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> and >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> 2.) the lack of SmallArrayArray's, which could let us avoid the >> >>>>>>>>> card marking overhead. These objects are all small, 3-4 pointers >> >>>>>>>>> wide. Card >> >>>>>>>>> marking doesn't help. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Alternately I could just try to do really evil things and >> >>>>>>>>> convert >> >>>>>>>>> the whole mess to SmallArrays and then figure out how to >> >>>>>>>>> unsafeCoerce my way >> >>>>>>>>> to glory, stuffing the #'d references to the other arrays >> >>>>>>>>> directly into the >> >>>>>>>>> SmallArray as slots, removing the limitation we see here by >> >>>>>>>>> aping the >> >>>>>>>>> MutableArrayArray# s API, but that gets really really dangerous! >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I'm pretty much willing to sacrifice almost anything on the >> >>>>>>>>> altar >> >>>>>>>>> of speed here, but I'd like to be able to let the GC move them >> >>>>>>>>> and collect >> >>>>>>>>> them which rules out simpler Ptr and Addr based solutions. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> -Edward >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On Thu, Aug 20, 2015 at 9:01 PM, Manuel M T Chakravarty >> >>>>>>>>> > wrote: >> >>>>>>>>> >> >>>>>>>>> That?s an interesting idea. >> >>>>>>>>> >> >>>>>>>>> Manuel >> >>>>>>>>> >> >>>>>>>>> > Edward Kmett >: >> >>>>>>>>> >> >>>>>>>>> > >> >>>>>>>>> > Would it be possible to add unsafe primops to add Array# and >> >>>>>>>>> > SmallArray# entries to an ArrayArray#? The fact that the >> >>>>>>>>> > ArrayArray# entries >> >>>>>>>>> > are all directly unlifted avoiding a level of indirection for >> >>>>>>>>> > the containing >> >>>>>>>>> > structure is amazing, but I can only currently use it if my >> >>>>>>>>> > leaf level data >> >>>>>>>>> > can be 100% unboxed and distributed among ByteArray#s. It'd be >> >>>>>>>>> > nice to be >> >>>>>>>>> > able to have the ability to put SmallArray# a stuff down at >> >>>>>>>>> > the leaves to >> >>>>>>>>> > hold lifted contents. >> >>>>>>>>> > >> >>>>>>>>> > I accept fully that if I name the wrong type when I go to >> >>>>>>>>> > access >> >>>>>>>>> > one of the fields it'll lie to me, but I suppose it'd do that >> >>>>>>>>> > if i tried to >> >>>>>>>>> > use one of the members that held a nested ArrayArray# as a >> >>>>>>>>> > ByteArray# >> >>>>>>>>> > anyways, so it isn't like there is a safety story preventing >> >>>>>>>>> > this. >> >>>>>>>>> > >> >>>>>>>>> > I've been hunting for ways to try to kill the indirection >> >>>>>>>>> > problems I get with Haskell and mutable structures, and I >> >>>>>>>>> > could shoehorn a >> >>>>>>>>> > number of them into ArrayArrays if this worked. >> >>>>>>>>> > >> >>>>>>>>> > Right now I'm stuck paying for 2 or 3 levels of unnecessary >> >>>>>>>>> > indirection compared to c/java and this could reduce that pain >> >>>>>>>>> > to just 1 >> >>>>>>>>> > level of unnecessary indirection. >> >>>>>>>>> > >> >>>>>>>>> > -Edward >> >>>>>>>>> >> >>>>>>>>> > _______________________________________________ >> >>>>>>>>> > ghc-devs mailing list >> >>>>>>>>> > ghc-devs at haskell.org >> >>>>>>>>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> _______________________________________________ >> >>>>>>>>> ghc-devs mailing list >> >>>>>>>>> ghc-devs at haskell.org >> >>>>>>>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> >>>>>>> >> >>>>>>> >> >>>>> >> >>> >> >> >> > >> > >> > _______________________________________________ >> > ghc-devs mailing list >> > ghc-devs at haskell.org >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> > > > _______________________________________________ ghc-devs mailing list ghc-devs at haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs -------------- next part -------------- An HTML attachment was scrubbed... URL: From simonpj at microsoft.com Mon Sep 7 14:35:50 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Mon, 7 Sep 2015 14:35:50 +0000 Subject: Unpacking sum types In-Reply-To: References: Message-ID: Good start. I have updated the page to separate the source-language design (what the programmer sees) from the implementation. And I have included boxed sums as well ? it would be deeply strange not to do so. Looks good to me! Simon From: Johan Tibell [mailto:johan.tibell at gmail.com] Sent: 01 September 2015 18:24 To: Simon Peyton Jones; Simon Marlow; Ryan Newton Cc: ghc-devs at haskell.org Subject: RFC: Unpacking sum types I have a draft design for unpacking sum types that I'd like some feedback on. In particular feedback both on: * the writing and clarity of the proposal and * the proposal itself. https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes -- Johan -------------- next part -------------- An HTML attachment was scrubbed... URL: From simonpj at microsoft.com Mon Sep 7 14:57:10 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Mon, 7 Sep 2015 14:57:10 +0000 Subject: more releases In-Reply-To: <87si6wkdta.fsf@smart-cactus.org> References: <3E39E8B5-89C2-40F6-9180-C6D73AF3926F@cis.upenn.edu> <87si6y1v30.fsf@gmail.com> <87oahlksnm.fsf@smart-cactus.org> <87si6wkdta.fsf@smart-cactus.org> Message-ID: <09dfe23cd20746c88beb0cfd308ef8f6@DB4PR30MB030.064d.mgd.msft.net> Merging and releasing a fix to the stable branch always carries a cost: it might break something else. There is a real cost to merging, which is why we've followed the lazy strategy that Ben describes. Still, even given the lazy strategy we could perfectly well put out minor releases more proactively; e.g. fix one bug (or a little batch) and release. Provided we could reduce the per-release costs. Simon | -----Original Message----- | From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Ben | Gamari | Sent: 02 September 2015 17:05 | To: Richard Eisenberg | Cc: GHC developers | Subject: Re: more releases | | Richard Eisenberg writes: | | > I think some of my idea was misunderstood here: my goal was to have | > quick releases only from the stable branch. The goal would not be to | > release the new and shiny, but instead to get bugfixes out to users | > quicker. The new and shiny (master) would remain as it is now. In | > other words: more users would be affected by this change than just | the | > vanguard. | > | I see. This is something we could certainly do. | | It would require, however, that we be more pro-active about continuing | to merge things to the stable branch after the release. | Currently the stable branch is essentially in the same state that it | was in for the 7.10.2 release. I've left it this way as it takes time | and care to cherry-pick patches to stable. Thusfar my poilcy has been | to perform this work lazily until it's clear that we will do another | stable release as otherwise the effort may well be wasted. | | So, even if the steps of building, testing, and uploading the release | are streamlined more frequent releases are still far from free. | Whether it's a worthwhile cost I don't know. | | This is a difficult question to answer without knowing more about how | typical users actually acquire GHC. For instance, this effort would | have minimal impact on users who get their compiler through their | distribution's package manager. On the other hand, if most users | download GHC bindists directly from the GHC download page, then | perhaps this would be effort well-spent. | | Cheers, | | - Ben From ryan.gl.scott at gmail.com Mon Sep 7 15:26:01 2015 From: ryan.gl.scott at gmail.com (Ryan Scott) Date: Mon, 7 Sep 2015 11:26:01 -0400 Subject: Proposal: Automatic derivation of Lift Message-ID: There is a Lift typeclass defined in template-haskell [1] which, when a data type is an instance, permits it to be directly used in a TH quotation, like so data Example = Example instance Lift Example where lift Example = conE (mkNameG_d "" "" "Example") e :: Example e = [| Example |] Making Lift instances for most data types is straightforward and mechanical, so the proposal is to allow automatic derivation of Lift via a -XDeriveLift extension: data Example = Example deriving Lift This is actually a pretty a pretty old proposal [2], dating back to 2007. I wanted to have this feature for my needs, so I submitted a proof-of-concept at the GHC Trac issue page [3]. The question now is: do we really want to bake this feature into GHC? Since not many people opined on the Trac page, I wanted to submit this here for wider visibility and to have a discussion. Here are some arguments I have heard against this feature (please tell me if I am misrepresenting your opinion): * We already have a th-lift package [4] on Hackage which allows derivation of Lift via Template Haskell functions. In addition, if you're using Lift, chances are you're also using the -XTemplateHaskell extension in the first place, so th-lift should be suitable. * The same functionality could be added via GHC generics (as of GHC 7.12/8.0, which adds the ability to reify a datatype's package name [5]), if -XTemplateHaskell can't be used. * Adding another -XDerive- extension places a burden on GHC devs to maintain it in the future in response to further Template Haskell changes. Here are my (opinionated) responses to each of these: * th-lift isn't as fully-featured as a -XDerive- extension at the moment, since it can't do sophisticated type inference [6] or derive for data families. This is something that could be addressed with a patch to th-lift, though. * GHC generics wouldn't be enough to handle unlifted types like Int#, Char#, or Double# (which other -XDerive- extensions do). * This is a subjective measurement, but in terms of the amount of code I had to add, -XDeriveLift was substantially simpler than other -XDerive extensions, because there are fewer weird corner cases. Plus, I'd volunteer to maintain it :) Simon PJ wanted to know if other Template Haskell programmers would find -XDeriveLift useful. Would you be able to use it? Would you like to see a solution other than putting it into GHC? I'd love to hear feedback so we can bring some closure to this 8-year-old feature request. Ryan S. ----- [1] http://hackage.haskell.org/package/template-haskell-2.10.0.0/docs/Language-Haskell-TH-Syntax.html#t:Lift [2] https://mail.haskell.org/pipermail/template-haskell/2007-October/000635.html [3] https://ghc.haskell.org/trac/ghc/ticket/1830 [4] http://hackage.haskell.org/package/th-lift [5] https://ghc.haskell.org/trac/ghc/ticket/10030 [6] https://ghc.haskell.org/trac/ghc/ticket/1830#comment:11 From spam at scientician.net Mon Sep 7 16:05:56 2015 From: spam at scientician.net (Bardur Arantsson) Date: Mon, 7 Sep 2015 18:05:56 +0200 Subject: more releases In-Reply-To: <09dfe23cd20746c88beb0cfd308ef8f6@DB4PR30MB030.064d.mgd.msft.net> References: <3E39E8B5-89C2-40F6-9180-C6D73AF3926F@cis.upenn.edu> <87si6y1v30.fsf@gmail.com> <87oahlksnm.fsf@smart-cactus.org> <87si6wkdta.fsf@smart-cactus.org> <09dfe23cd20746c88beb0cfd308ef8f6@DB4PR30MB030.064d.mgd.msft.net> Message-ID: On 09/07/2015 04:57 PM, Simon Peyton Jones wrote: > Merging and releasing a fix to the stable branch always carries a cost: > it might break something else. There is a real cost to merging, which > is why we've followed the lazy strategy that Ben describes. > A valid point, but the upside is that it's a very fast operation to revert if a release is "bad"... and get that updated release into the wild. Regards, From dan.doel at gmail.com Mon Sep 7 17:53:11 2015 From: dan.doel at gmail.com (Dan Doel) Date: Mon, 7 Sep 2015 13:53:11 -0400 Subject: Unpacking sum types In-Reply-To: References: Message-ID: Are we okay with stealing some operator sections for this? E.G. (x ||). I think the boxed sums larger than 2 choices are all technically overlapping with sections. On Mon, Sep 7, 2015 at 10:35 AM, Simon Peyton Jones wrote: > Good start. > > > > I have updated the page to separate the source-language design (what the > programmer sees) from the implementation. > > > > And I have included boxed sums as well ? it would be deeply strange not to > do so. > > > > Looks good to me! > > > > Simon > > > > From: Johan Tibell [mailto:johan.tibell at gmail.com] > Sent: 01 September 2015 18:24 > To: Simon Peyton Jones; Simon Marlow; Ryan Newton > Cc: ghc-devs at haskell.org > Subject: RFC: Unpacking sum types > > > > I have a draft design for unpacking sum types that I'd like some feedback > on. In particular feedback both on: > > > > * the writing and clarity of the proposal and > > * the proposal itself. > > > > https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes > > > > -- Johan > > > > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > From ezyang at mit.edu Mon Sep 7 17:57:43 2015 From: ezyang at mit.edu (Edward Z. Yang) Date: Mon, 07 Sep 2015 10:57:43 -0700 Subject: Unlifted data types In-Reply-To: References: <1441353701-sup-9422@sabre> <1441390306-sup-6240@sabre> <1441400654-sup-1647@sabre> <1441436053-sup-5590@sabre> Message-ID: <1441648640-sup-9581@sabre> Yes, I think you are right. I've restructured the spec so that 'Box' is an optional extension. Excerpts from Dan Doel's message of 2015-09-06 13:56:35 -0700: > On Sat, Sep 5, 2015 at 1:35 PM, Dan Doel wrote: > > Also, the constructor isn't exactly relevant, so much as whether the > > unlifted error occurs inside the definition of a lifted thing. > > So, in light of this, `Box` is not necessary to define `suspend`. We > can simply write: > > suspend :: Force a -> a > suspend (Force x) = x > > and the fact that `a` has kind * means that `suspend undefined` only > throws an exception if you inspect it. > > `Box` as currently defined (not the previous GADT definition) is novel > in that it allows you to suspend unlifted types that weren't derived > from `Force`. And it would probably be useful to have coercions > between `Box (Force a)` and `a`, and `Force (Box u)` and `u`. But (I > think) it is not necessary for mediating between `Force a` and `a`. > > -- Dan From ezyang at mit.edu Mon Sep 7 18:13:29 2015 From: ezyang at mit.edu (Edward Z. Yang) Date: Mon, 07 Sep 2015 11:13:29 -0700 Subject: Unlifted data types In-Reply-To: References: <1441353701-sup-9422@sabre> <1441390306-sup-6240@sabre> <1441400654-sup-1647@sabre> <1441436053-sup-5590@sabre> Message-ID: <1441648807-sup-660@sabre> Excerpts from Dan Doel's message of 2015-09-05 10:35:44 -0700: > I tried with `error` first, and it worked exactly the way I described. > > But I guess it's a type inference weirdness. If I annotate mv# with > MutVar# it will work, whereas otherwise it will be inferred that mv# > :: a where a :: *, instead of #. Whereas !x is a pattern which > requires monomorphism of x, and so it figures out mv# :: MutVar# .... > Kind of an odd corner case where breaking cycles causes things _not_ > to type check, due to open kinds not being first class. > > I thought I remembered that at some point it was decided that `let` > bindings of unboxed things should be required to have bangs on the > bindings, to indicate the evaluation order. Maybe I'm thinking of > something else (was it that it was originally required and we got rid > of it?). Ah yes, I added an explicit type signature, which is why I didn't see your problem. As for requiring bang, I think probably you are thinking of: commit 831a35dd00faff195cf938659c2dd736192b865f Author: Ian Lynagh Date: Fri Apr 24 12:47:54 2009 +0000 Require a bang pattern when unlifted types are where/let bound; #3182 For now we only get a warning, rather than an error, because the alex and happy templates don't follow the new rules yet. But Simon eventually made it be less chatty: commit 67157c5c25c8044b54419470b5e8cc677be060c3 Author: simonpj at microsoft.com Date: Tue Nov 16 17:18:43 2010 +0000 Warn a bit less often about unlifted bindings. Warn when (a) a pattern bindings binds unlifted values (b) it has no top-level bang (c) the RHS has a *lifted* type Clause (c) is new, argued for by Simon M Eg x# = 4# + 4# -- No warning (# a,b #) = blah -- No warning I# x = blah -- Warning Since in our cases the RHS is not lifted, no warning occurs. > > Nope, if you just float the error call out of MV, you will go from > > "Okay." to an exception. Notice that *data constructors* are what are > > used to induce suspension. This is why we don't have a 'suspend' > > special form; instead, 'Box' is used directly. > > I know that it's the floating that makes a difference, not the bang > pattern. The point would be to make the syntax require the bang > pattern to give a visual indication of when it happens, and make it > illegal to look like you're doing a normal let that doesn't change the > value (although having it actually be a bang pattern would be bad, > because it'd restrict polymorphism of the definition). I think this is a reasonable thing to ask for. I also think, with the commit set above, this very discussion happened in 2010, and was resolved in favor of not warning in this case for unboxed types. Maybe the situation is different with unlifted data types; it's hard for me to tell. > Also, the constructor isn't exactly relevant, so much as whether the > unlifted error occurs inside the definition of a lifted thing. For > instance, we can go from: > > let mv = MutVar undefined > > to: > > let mv = let mv# :: MutVar# RealWorld a ; mv# = undefined in MutVar mv# > > and the result is the same, because it is the definition of mv that is > lazy. Constructors in complex expressions---and all subexpressions for > that matter---just get compiled this way. E.G. > > let f :: MutVar# RealWorld a -> MutVar a > f mv# = f mv# > in flip const (f undefined) $ putStrLn "okay" > > No constructors involved, but no error. Yes, you are right. I incorrectly surmised that a suspension function would have to be special form, but in fact, it does not need to be. > Okay. So, there isn't representational overhead, but there is > overhead, where you call a function or something (which will just > return its argument), whereas newtype constructors end up not having > any cost whatsoever? You might hope that it can get inlined away. But yes, a coercion would be best. Edward From jmcf125 at openmailbox.org Mon Sep 7 18:18:11 2015 From: jmcf125 at openmailbox.org (jmcf125 at openmailbox.org) Date: Mon, 7 Sep 2015 19:18:11 +0100 Subject: Cannot have GHC in ARMv6 architecture Message-ID: <20150907181811.GA1668@jmcf125-Acer-Arch.home> Hi, I have tried to have GHC in my Raspberry Pi, got stuck in the issue 7754 (https://ghc.haskell.org/trac/ghc/ticket/7754), since I didn't know where to pass options to terminfo's configure file, although I did copy the headers from my Raspberry Pi. I've been using HUGS ever since, as Arch Linux doesn't have GHC for ARMv6, and deb2targz would not work. I'm aware I have a phase 0 compiler installed, need to build a phase 1 compiler, and use that to cross-compile GHC itself (I wasn't the 1st time I tried, couldn't find as much information as now). I've read the following pages: https://ghc.haskell.org/trac/ghc/wiki/Building/Preparation/RaspberryPi https://ghc.haskell.org/trac/ghc/wiki/Building/CrossCompiling https://ghc.haskell.org/trac/ghc/wiki/CrossCompilation along with quite a few bug reports, and questions on Stack Overflow that seemed related but really aren't, or that are already exposed in the tickets mentioned below. Below, /home/jmcf125/ghc-raspberry-pi/ghc/libraries/terminfo/include-curses and /home/jmcf125/ghc-raspberry-pi/ghc/libraries/terminfo/lib-curses are directories to which I copied any headers and libraries, respectively, from my Raspberry Pi. I'm not sure which libraries I am supposed to point configure at. These are the headers-libraries combinations I tried: $ ./configure --target=arm-linux-gnueabihf --with-curses-includes=/home/jmcf125/ghc-raspberry-pi/ghc/libraries/terminfo/include-curses --with-curses-libraries=/home/jmcf125/ghc-raspberry-pi/ghc/libraries/terminfo/lib-curses && make -j5 (...) checking for unistd.h... yes checking ncurses.h usability... no checking ncurses.h presence... no checking for ncurses.h... no checking curses.h usability... no checking curses.h presence... no checking for curses.h... no configure: error: in `/home/jmcf125/ghc-raspberry-pi/ghc/libraries/terminfo': configure: error: curses headers could not be found, so this package cannot be built See `config.log' for more details libraries/terminfo/ghc.mk:4: recipe for target 'libraries/terminfo/dist-install/package-data.mk' failed make[1]: *** [libraries/terminfo/dist-install/package-data.mk] Error 1 Makefile:71: recipe for target 'all' failed make: *** [all] Error 2 (https://ghc.haskell.org/trac/ghc/ticket/7754) $ ./configure --target=arm-linux-gnueabihf --with-curses-includes=/usr/include --with-curses-libraries=/home/jmcf125/ghc-raspberry-pi/ghc/libraries/terminfo/lib-curses && make -j5 (...) checking for unistd.h... yes checking ncurses.h usability... yes checking ncurses.h presence... yes checking for ncurses.h... yes checking for setupterm in -ltinfo... no checking for setupterm in -lncursesw... no checking for setupterm in -lncurses... no checking for setupterm in -lcurses... no configure: error: in `/home/jmcf125/ghc-raspberry-pi/ghc/libraries/terminfo': configure: error: curses library not found, so this package cannot be built See `config.log' for more details libraries/terminfo/ghc.mk:4: recipe for target 'libraries/terminfo/dist-install/package-data.mk' failed make[1]: *** [libraries/terminfo/dist-install/package-data.mk] Error 1 Makefile:71: recipe for target 'all' failed make: *** [all] Error 2 (https://ghc.haskell.org/trac/ghc/ticket/7281) $ ./configure --target=arm-linux-gnueabihf --with-curses-includes=/usr/include --with-curses-libraries=/usr/lib && make -j5 $ ./configure --target=arm-linux-gnueabihf --with-curses-includes=/home/jmcf125/ghc-raspberry-pi/ghc/libraries/terminfo/include-curses --with-curses-libraries=/usr/lib && make -j5 (...) Configuring terminfo-0.4.0.1... configure: WARNING: unrecognized options: --with-compiler, --with-gcc checking for arm-unknown-linux-gnueabihf-gcc... /home/jmcf125/ghc-raspberry-pi/tools/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian/bin/arm-linux-gnueabihf-gcc checking whether the C compiler works... no configure: error: in `/home/jmcf125/ghc-raspberry-pi/ghc/libraries/terminfo': configure: error: C compiler cannot create executables See `config.log' for more details libraries/terminfo/ghc.mk:4: recipe for target 'libraries/terminfo/dist-install/package-data.mk' failed make[1]: *** [libraries/terminfo/dist-install/package-data.mk] Error 77 Makefile:71: recipe for target 'all' failed make: *** [all] Error 2 Concerning the last 2, it'd be odd if they worked, how could, say, the GCC cross-compiler for ARM use x86_64 libraries? I tried them anyway, since the 1st 2 errors didn't seem to make sense... Also, options --includedir and --oldincludedir seem to have no effect (always get issue 7754), and it doesn't matter if I build registarised or not, the results are the same (registarised, I'm using LLVM 3.6.2-3). I'm sorry if I'm wasting your time with what to you might seem such a simple thing, and not real development on GHC, but I don't know where else to turn to. I'm still learning Haskell, and have never had to compile compilers before. Thank you in advance, Jo?o Miguel From ezyang at mit.edu Mon Sep 7 18:30:16 2015 From: ezyang at mit.edu (Edward Z. Yang) Date: Mon, 07 Sep 2015 11:30:16 -0700 Subject: Using GHC API to compile Haskell file In-Reply-To: References: <1440368677-sup-472@sabre> Message-ID: <1441649731-sup-8699@sabre> Hello Neil, It looks like my second message got eaten. Let's try again. > 1) Is there any way to do the two compilations sharing some cached > state, e.g. loaded packages/.hi files, so each compilation goes > faster. You can, using withTempSession in the GhcMonad. The external package state will be preserved across calls here, but things put in the HPT will get thrown out. > 2) Is there any way to do the link alone through the GHC API. I am confused by your code. There are two ways you can do linking: 1. Explicitly specify all of the objects to link together. This works even if the source files aren't available. 2. Run ghc --make. This does dependency analysis to figure out what objects to link together, but since everything is already compiled, it just links. Your code seems to be trying to do (1) and (2) simultaneously (you set the mode to OneShot, but then you call load which calls into GhcMake). If you want to use (1), stop calling load and call 'oneShot' instead. If you want to use (2), just reuse your working --make code. (BTW, how did I figure this all out? By looking at ghc/Main.hs). Cheers, Edward From tomberek at gmail.com Mon Sep 7 18:36:22 2015 From: tomberek at gmail.com (Thomas Bereknyei) Date: Mon, 7 Sep 2015 14:36:22 -0400 Subject: Proposal: Automatic derivation of Lift In-Reply-To: References: Message-ID: Yes, I would find DeriveLift useful and a pleasant improvement to the Template Haskell ecosystem. I am relatively new to TH and was wondering about a few things (if this hijacks the thread we can start a new one); Other quotations, [m| for 'Q Match' would be helpful to define collections of matches that can be combined and manipulated. One can use Q (Pat,Body,[Decl]) but you lose the ability for the Body to refer to a variable bound in the Pat. One can use Q Exp for just a Lambda, but you cant just combine lambdas to create a Match expression without some machinery. Promotion of a Pat to an Exp. A subset of Pat can create an expression such that \ $pat -> $(promote pat) is id. Tom There is a Lift typeclass defined in template-haskell [1] which, when a data type is an instance, permits it to be directly used in a TH quotation, like so data Example = Example instance Lift Example where lift Example = conE (mkNameG_d "" "" "Example") e :: Example e = [| Example |] Making Lift instances for most data types is straightforward and mechanical, so the proposal is to allow automatic derivation of Lift via a -XDeriveLift extension: data Example = Example deriving Lift This is actually a pretty a pretty old proposal [2], dating back to 2007. I wanted to have this feature for my needs, so I submitted a proof-of-concept at the GHC Trac issue page [3]. The question now is: do we really want to bake this feature into GHC? Since not many people opined on the Trac page, I wanted to submit this here for wider visibility and to have a discussion. Here are some arguments I have heard against this feature (please tell me if I am misrepresenting your opinion): * We already have a th-lift package [4] on Hackage which allows derivation of Lift via Template Haskell functions. In addition, if you're using Lift, chances are you're also using the -XTemplateHaskell extension in the first place, so th-lift should be suitable. * The same functionality could be added via GHC generics (as of GHC 7.12/8.0, which adds the ability to reify a datatype's package name [5]), if -XTemplateHaskell can't be used. * Adding another -XDerive- extension places a burden on GHC devs to maintain it in the future in response to further Template Haskell changes. Here are my (opinionated) responses to each of these: * th-lift isn't as fully-featured as a -XDerive- extension at the moment, since it can't do sophisticated type inference [6] or derive for data families. This is something that could be addressed with a patch to th-lift, though. * GHC generics wouldn't be enough to handle unlifted types like Int#, Char#, or Double# (which other -XDerive- extensions do). * This is a subjective measurement, but in terms of the amount of code I had to add, -XDeriveLift was substantially simpler than other -XDerive extensions, because there are fewer weird corner cases. Plus, I'd volunteer to maintain it :) Simon PJ wanted to know if other Template Haskell programmers would find -XDeriveLift useful. Would you be able to use it? Would you like to see a solution other than putting it into GHC? I'd love to hear feedback so we can bring some closure to this 8-year-old feature request. Ryan S. ----- [1] http://hackage.haskell.org/package/template-haskell-2.10.0.0/docs/Language-Haskell-TH-Syntax.html#t:Lift [2] https://mail.haskell.org/pipermail/template-haskell/2007-October/000635.html [3] https://ghc.haskell.org/trac/ghc/ticket/1830 [4] http://hackage.haskell.org/package/th-lift [5] https://ghc.haskell.org/trac/ghc/ticket/10030 [6] https://ghc.haskell.org/trac/ghc/ticket/1830#comment:11 _______________________________________________ ghc-devs mailing list ghc-devs at haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthewtpickering at gmail.com Mon Sep 7 19:10:34 2015 From: matthewtpickering at gmail.com (Matthew Pickering) Date: Mon, 7 Sep 2015 21:10:34 +0200 Subject: Proposal: Automatic derivation of Lift In-Reply-To: References: Message-ID: Continuing my support of the generics route. Is there a fundamental reason why it couldn't handle unlifted types? Given their relative paucity, it seems like a fair compromise to generically define lift instances for all normal data types but require TH for unlifted types. This approach seems much smoother from a maintenance perspective. On Mon, Sep 7, 2015 at 5:26 PM, Ryan Scott wrote: > There is a Lift typeclass defined in template-haskell [1] which, when > a data type is an instance, permits it to be directly used in a TH > quotation, like so > > data Example = Example > > instance Lift Example where > lift Example = conE (mkNameG_d "" "" "Example") > > e :: Example > e = [| Example |] > > Making Lift instances for most data types is straightforward and > mechanical, so the proposal is to allow automatic derivation of Lift > via a -XDeriveLift extension: > > data Example = Example deriving Lift > > This is actually a pretty a pretty old proposal [2], dating back to > 2007. I wanted to have this feature for my needs, so I submitted a > proof-of-concept at the GHC Trac issue page [3]. > > The question now is: do we really want to bake this feature into GHC? > Since not many people opined on the Trac page, I wanted to submit this > here for wider visibility and to have a discussion. > > Here are some arguments I have heard against this feature (please tell > me if I am misrepresenting your opinion): > > * We already have a th-lift package [4] on Hackage which allows > derivation of Lift via Template Haskell functions. In addition, if > you're using Lift, chances are you're also using the -XTemplateHaskell > extension in the first place, so th-lift should be suitable. > * The same functionality could be added via GHC generics (as of GHC > 7.12/8.0, which adds the ability to reify a datatype's package name > [5]), if -XTemplateHaskell can't be used. > * Adding another -XDerive- extension places a burden on GHC devs to > maintain it in the future in response to further Template Haskell > changes. > > Here are my (opinionated) responses to each of these: > > * th-lift isn't as fully-featured as a -XDerive- extension at the > moment, since it can't do sophisticated type inference [6] or derive > for data families. This is something that could be addressed with a > patch to th-lift, though. > * GHC generics wouldn't be enough to handle unlifted types like Int#, > Char#, or Double# (which other -XDerive- extensions do). > * This is a subjective measurement, but in terms of the amount of code > I had to add, -XDeriveLift was substantially simpler than other > -XDerive extensions, because there are fewer weird corner cases. Plus, > I'd volunteer to maintain it :) > > Simon PJ wanted to know if other Template Haskell programmers would > find -XDeriveLift useful. Would you be able to use it? Would you like > to see a solution other than putting it into GHC? I'd love to hear > feedback so we can bring some closure to this 8-year-old feature > request. > > Ryan S. > > ----- > [1] http://hackage.haskell.org/package/template-haskell-2.10.0.0/docs/Language-Haskell-TH-Syntax.html#t:Lift > [2] https://mail.haskell.org/pipermail/template-haskell/2007-October/000635.html > [3] https://ghc.haskell.org/trac/ghc/ticket/1830 > [4] http://hackage.haskell.org/package/th-lift > [5] https://ghc.haskell.org/trac/ghc/ticket/10030 > [6] https://ghc.haskell.org/trac/ghc/ticket/1830#comment:11 > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From simonpj at microsoft.com Mon Sep 7 19:25:57 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Mon, 7 Sep 2015 19:25:57 +0000 Subject: Unpacking sum types In-Reply-To: References: Message-ID: <9eb2c9041f6142ce947a4b323c0b2bff@DB4PR30MB030.064d.mgd.msft.net> | Are we okay with stealing some operator sections for this? E.G. (x | ||). I think the boxed sums larger than 2 choices are all technically | overlapping with sections. I hadn't thought of that. I suppose that in distfix notation we could require spaces (x | |) since vertical bar by itself isn't an operator. But then (_||) x might feel more compact. Also a section (x ||) isn't valid in a pattern, so we would not need to require spaces there. But my gut feel is: yes, with AnonymousSums we should just steal the syntax. It won't hurt existing code (since it won't use AnonymousSums), and if you *are* using AnonymousSums then the distfix notation is probably more valuable than the sections for an operator you probably aren't using. I've updated the wiki page Simon | -----Original Message----- | From: Dan Doel [mailto:dan.doel at gmail.com] | Sent: 07 September 2015 18:53 | To: Simon Peyton Jones | Cc: Johan Tibell; Simon Marlow; Ryan Newton; ghc-devs at haskell.org | Subject: Re: Unpacking sum types | | Are we okay with stealing some operator sections for this? E.G. (x | ||). I think the boxed sums larger than 2 choices are all technically | overlapping with sections. | | On Mon, Sep 7, 2015 at 10:35 AM, Simon Peyton Jones | wrote: | > Good start. | > | > | > | > I have updated the page to separate the source-language design (what | the | > programmer sees) from the implementation. | > | > | > | > And I have included boxed sums as well ? it would be deeply strange not | to | > do so. | > | > | > | > Looks good to me! | > | > | > | > Simon | > | > | > | > From: Johan Tibell [mailto:johan.tibell at gmail.com] | > Sent: 01 September 2015 18:24 | > To: Simon Peyton Jones; Simon Marlow; Ryan Newton | > Cc: ghc-devs at haskell.org | > Subject: RFC: Unpacking sum types | > | > | > | > I have a draft design for unpacking sum types that I'd like some | feedback | > on. In particular feedback both on: | > | > | > | > * the writing and clarity of the proposal and | > | > * the proposal itself. | > | > | > | > https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes | > | > | > | > -- Johan | > | > | > | > | > _______________________________________________ | > ghc-devs mailing list | > ghc-devs at haskell.org | > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs | > From karel.gardas at centrum.cz Mon Sep 7 19:37:21 2015 From: karel.gardas at centrum.cz (Karel Gardas) Date: Mon, 07 Sep 2015 21:37:21 +0200 Subject: Cannot have GHC in ARMv6 architecture In-Reply-To: <20150907181811.GA1668@jmcf125-Acer-Arch.home> References: <20150907181811.GA1668@jmcf125-Acer-Arch.home> Message-ID: <55EDE771.9010404@centrum.cz> Hi, I think sysroot option may help you. I wrote something about it for ARMv8 in the past here: https://ghcarm.wordpress.com/2014/01/18/unregisterised-ghc-head-build-for-arm64-platform/ Cheers, Karel On 09/ 7/15 08:18 PM, jmcf125 at openmailbox.org wrote: > Hi, > > I have tried to have GHC in my Raspberry Pi, got stuck in the issue 7754 > (https://ghc.haskell.org/trac/ghc/ticket/7754), since I didn't know > where to pass options to terminfo's configure file, although I did copy > the headers from my Raspberry Pi. I've been using HUGS ever since, as > Arch Linux doesn't have GHC for ARMv6, and deb2targz would not work. > > I'm aware I have a phase 0 compiler installed, need to build a phase 1 > compiler, and use that to cross-compile GHC itself (I wasn't the 1st > time I tried, couldn't find as much information as now). > > I've read the following pages: > https://ghc.haskell.org/trac/ghc/wiki/Building/Preparation/RaspberryPi > https://ghc.haskell.org/trac/ghc/wiki/Building/CrossCompiling > https://ghc.haskell.org/trac/ghc/wiki/CrossCompilation > along with quite a few bug reports, and questions on Stack Overflow that > seemed related but really aren't, or that are already exposed in the > tickets mentioned below. > > Below, > /home/jmcf125/ghc-raspberry-pi/ghc/libraries/terminfo/include-curses and > /home/jmcf125/ghc-raspberry-pi/ghc/libraries/terminfo/lib-curses are > directories to which I copied any headers and libraries, respectively, > from my Raspberry Pi. > > I'm not sure which libraries I am supposed to point configure at. These > are the headers-libraries combinations I tried: > > $ ./configure --target=arm-linux-gnueabihf --with-curses-includes=/home/jmcf125/ghc-raspberry-pi/ghc/libraries/terminfo/include-curses --with-curses-libraries=/home/jmcf125/ghc-raspberry-pi/ghc/libraries/terminfo/lib-curses && make -j5 > (...) > checking for unistd.h... yes > checking ncurses.h usability... no > checking ncurses.h presence... no > checking for ncurses.h... no > checking curses.h usability... no > checking curses.h presence... no > checking for curses.h... no > configure: error: in `/home/jmcf125/ghc-raspberry-pi/ghc/libraries/terminfo': > configure: error: curses headers could not be found, so this package cannot be built > See `config.log' for more details > libraries/terminfo/ghc.mk:4: recipe for target 'libraries/terminfo/dist-install/package-data.mk' failed > make[1]: *** [libraries/terminfo/dist-install/package-data.mk] Error 1 > Makefile:71: recipe for target 'all' failed > make: *** [all] Error 2 > (https://ghc.haskell.org/trac/ghc/ticket/7754) > > $ ./configure --target=arm-linux-gnueabihf --with-curses-includes=/usr/include --with-curses-libraries=/home/jmcf125/ghc-raspberry-pi/ghc/libraries/terminfo/lib-curses && make -j5 > (...) > checking for unistd.h... yes > checking ncurses.h usability... yes > checking ncurses.h presence... yes > checking for ncurses.h... yes > checking for setupterm in -ltinfo... no > checking for setupterm in -lncursesw... no > checking for setupterm in -lncurses... no > checking for setupterm in -lcurses... no > configure: error: in `/home/jmcf125/ghc-raspberry-pi/ghc/libraries/terminfo': > configure: error: curses library not found, so this package cannot be built > See `config.log' for more details > libraries/terminfo/ghc.mk:4: recipe for target 'libraries/terminfo/dist-install/package-data.mk' failed > make[1]: *** [libraries/terminfo/dist-install/package-data.mk] Error 1 > Makefile:71: recipe for target 'all' failed > make: *** [all] Error 2 > (https://ghc.haskell.org/trac/ghc/ticket/7281) > > $ ./configure --target=arm-linux-gnueabihf --with-curses-includes=/usr/include --with-curses-libraries=/usr/lib && make -j5 > > $ ./configure --target=arm-linux-gnueabihf --with-curses-includes=/home/jmcf125/ghc-raspberry-pi/ghc/libraries/terminfo/include-curses --with-curses-libraries=/usr/lib && make -j5 > (...) > Configuring terminfo-0.4.0.1... > configure: WARNING: unrecognized options: --with-compiler, --with-gcc > checking for arm-unknown-linux-gnueabihf-gcc... > /home/jmcf125/ghc-raspberry-pi/tools/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian/bin/arm-linux-gnueabihf-gcc > checking whether the C compiler works... no > configure: error: in `/home/jmcf125/ghc-raspberry-pi/ghc/libraries/terminfo': > configure: error: C compiler cannot create executables > See `config.log' for more details > libraries/terminfo/ghc.mk:4: recipe for target 'libraries/terminfo/dist-install/package-data.mk' failed > make[1]: *** [libraries/terminfo/dist-install/package-data.mk] Error 77 > Makefile:71: recipe for target 'all' failed > make: *** [all] Error 2 > > Concerning the last 2, it'd be odd if they worked, how could, say, the > GCC cross-compiler for ARM use x86_64 libraries? I tried them anyway, > since the 1st 2 errors didn't seem to make sense... > > Also, options --includedir and --oldincludedir seem to have no effect > (always get issue 7754), and it doesn't matter if I build registarised > or not, the results are the same (registarised, I'm using LLVM 3.6.2-3). > > I'm sorry if I'm wasting your time with what to you might seem such a > simple thing, and not real development on GHC, but I don't know where > else to turn to. I'm still learning Haskell, and have never had to > compile compilers before. > > Thank you in advance, > Jo?o Miguel > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > From simonpj at microsoft.com Mon Sep 7 19:41:05 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Mon, 7 Sep 2015 19:41:05 +0000 Subject: Unlifted data types In-Reply-To: <1441381504-sup-5051@sabre> References: <1441353701-sup-9422@sabre> <1441380599.3893947.374883985.0FBB1F3A@webmail.messagingengine.com> <1441381088-sup-172@sabre> <1441381504-sup-5051@sabre> Message-ID: <191e443a7b3049dab4a2384779c1dfda@DB4PR30MB030.064d.mgd.msft.net> | Michael Greenberg points out on Twitter that suspend must be a special | form, just like lambda abstraction. This isn't reflected on the wiki. Simon | -----Original Message----- | From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Edward | Z. Yang | Sent: 04 September 2015 16:46 | To: Eric Seidel; ghc-devs | Subject: Re: Unlifted data types | | Excerpts from Edward Z. Yang's message of 2015-09-04 08:43:48 -0700: | > Yes. Actually, you have a good point that we'd like to have functions | > 'force :: Int -> !Int' and 'suspend :: !Int -> Int'. Unfortunately, we | > can't generate 'Coercible' instances for these types unless Coercible | becomes | > polykinded. Perhaps we can make a new type class, or just magic | > polymorphic functions. | | Michael Greenberg points out on Twitter that suspend must be a special | form, just like lambda abstraction. | | Edward | _______________________________________________ | ghc-devs mailing list | ghc-devs at haskell.org | http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From simonpj at microsoft.com Mon Sep 7 20:00:04 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Mon, 7 Sep 2015 20:00:04 +0000 Subject: Unlifted data types In-Reply-To: <1441353701-sup-9422@sabre> References: <1441353701-sup-9422@sabre> Message-ID: <6707b31c94d44af89ba2a90580ac46ce@DB4PR30MB030.064d.mgd.msft.net> | After many discussions and beers at ICFP, I've written up my current | best understanding of the unlifted data types proposal: | | https://ghc.haskell.org/trac/ghc/wiki/UnliftedDataTypes Too many beers! I like some, but not all, of this. There are several distinct things being mixed up. (1) First, a proposal to allow a data type to be declared to be unlifted. On its own, this is a pretty simple proposal: * Data types are always boxed, and so would unlifted data types * But an unlifted data type does not include bottom, and operationally is always represented by a pointer to the value. Just like Array#. * ALL the evaluation rules are IDENTICAL to those for other unlifted types such as Int# and Array#. Lets are strict, and cannot be recursive, function arguments are evaluated before the call. Literally nothing new here. * The code generator can generate more efficient case expressions, because the pointer always points to a value, never to a thunk or (I believe) an indirection. I think there are some special cases in GC and the RTS to ensure that this invariant holds. And that's it. Syntax: I'd suggest something more prominent than an Unlifted return kind, such as data unlifted T a = L a | R a but I would not die for this. I would really like to see this articulated as a stand-alone proposal. It makes sense by itself, and is really pretty simple. (2) Second, we cannot expect levity polymorphism. Consider map f (x:xs) = f x : map f xs Is the (f x) a thunk or is it evaluated strictly? Unless you are going to clone the code for map (which levity polymorphism is there to avoid), we can't answer "it depends on the type of (f x)". So, no, I think levity polymorphism is out. So I vote against splitting # into two: plain will do just fine. (3) Third, the stuff about Force and suspend. Provided you do no more than write library code that uses the above new features I'm fine. But there seems to be lots of stuff that dances around the hope that (Force a) is represented the same way as 'a'. I don't' know how to make this fly. Is there a coercion in FC? If so then (a ~R Force a). And that seems very doubtful since we must do some evaluation. I got lost in all the traffic about it. (4) Fourth, you don't mention a related suggestion, namely to allow newtype T = MkT Int# with T getting kind #. I see no difficulty here. We do have (T ~R Int#). It's just a useful way of wrapping a newtype around an unlifted type. My suggestion: let's nail down (1), including a boxed version of Force an suspend as plain library code, if you want, and perhaps (4); and only THEN tackle the trickiness of unboxing Force. Simon | -----Original Message----- | From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Edward | Z. Yang | Sent: 04 September 2015 09:04 | To: ghc-devs | Subject: Unlifted data types | | Hello friends, | | After many discussions and beers at ICFP, I've written up my current | best understanding of the unlifted data types proposal: | | https://ghc.haskell.org/trac/ghc/wiki/UnliftedDataTypes | | Many thanks to Richard, Iavor, Ryan, Simon, Duncan, George, Paul, | Edward Kmett, and any others who I may have forgotten for crystallizing | this proposal. | | Cheers, | Edward | _______________________________________________ | ghc-devs mailing list | ghc-devs at haskell.org | http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From ryan.gl.scott at gmail.com Mon Sep 7 20:02:45 2015 From: ryan.gl.scott at gmail.com (Ryan Scott) Date: Mon, 7 Sep 2015 16:02:45 -0400 Subject: Proposal: Automatic derivation of Lift In-Reply-To: References: Message-ID: Unlifted types can't be used polymorphically or in instance declarations, so this makes it impossible to do something like instance Generic Int# or store an Int# in one branch of a (:*:), preventing generics from doing anything in #-land. (unless someone has found a way to hack around this). I would be okay with implementing a generics-based approach, but we'd have to add a caveat that it will only work out-of-the-box on GHC 8.0 or later, due to TH's need to look up package information. (We could give users the ability to specify a package name manually as a workaround.) If this were added, where would be the best place to put it? th-lift? generic-deriving? template-haskell? A new package (lift-generics)? Ryan S. On Mon, Sep 7, 2015 at 3:10 PM, Matthew Pickering wrote: > Continuing my support of the generics route. Is there a fundamental > reason why it couldn't handle unlifted types? Given their relative > paucity, it seems like a fair compromise to generically define lift > instances for all normal data types but require TH for unlifted types. > This approach seems much smoother from a maintenance perspective. > > On Mon, Sep 7, 2015 at 5:26 PM, Ryan Scott wrote: >> There is a Lift typeclass defined in template-haskell [1] which, when >> a data type is an instance, permits it to be directly used in a TH >> quotation, like so >> >> data Example = Example >> >> instance Lift Example where >> lift Example = conE (mkNameG_d "" "" "Example") >> >> e :: Example >> e = [| Example |] >> >> Making Lift instances for most data types is straightforward and >> mechanical, so the proposal is to allow automatic derivation of Lift >> via a -XDeriveLift extension: >> >> data Example = Example deriving Lift >> >> This is actually a pretty a pretty old proposal [2], dating back to >> 2007. I wanted to have this feature for my needs, so I submitted a >> proof-of-concept at the GHC Trac issue page [3]. >> >> The question now is: do we really want to bake this feature into GHC? >> Since not many people opined on the Trac page, I wanted to submit this >> here for wider visibility and to have a discussion. >> >> Here are some arguments I have heard against this feature (please tell >> me if I am misrepresenting your opinion): >> >> * We already have a th-lift package [4] on Hackage which allows >> derivation of Lift via Template Haskell functions. In addition, if >> you're using Lift, chances are you're also using the -XTemplateHaskell >> extension in the first place, so th-lift should be suitable. >> * The same functionality could be added via GHC generics (as of GHC >> 7.12/8.0, which adds the ability to reify a datatype's package name >> [5]), if -XTemplateHaskell can't be used. >> * Adding another -XDerive- extension places a burden on GHC devs to >> maintain it in the future in response to further Template Haskell >> changes. >> >> Here are my (opinionated) responses to each of these: >> >> * th-lift isn't as fully-featured as a -XDerive- extension at the >> moment, since it can't do sophisticated type inference [6] or derive >> for data families. This is something that could be addressed with a >> patch to th-lift, though. >> * GHC generics wouldn't be enough to handle unlifted types like Int#, >> Char#, or Double# (which other -XDerive- extensions do). >> * This is a subjective measurement, but in terms of the amount of code >> I had to add, -XDeriveLift was substantially simpler than other >> -XDerive extensions, because there are fewer weird corner cases. Plus, >> I'd volunteer to maintain it :) >> >> Simon PJ wanted to know if other Template Haskell programmers would >> find -XDeriveLift useful. Would you be able to use it? Would you like >> to see a solution other than putting it into GHC? I'd love to hear >> feedback so we can bring some closure to this 8-year-old feature >> request. >> >> Ryan S. >> >> ----- >> [1] http://hackage.haskell.org/package/template-haskell-2.10.0.0/docs/Language-Haskell-TH-Syntax.html#t:Lift >> [2] https://mail.haskell.org/pipermail/template-haskell/2007-October/000635.html >> [3] https://ghc.haskell.org/trac/ghc/ticket/1830 >> [4] http://hackage.haskell.org/package/th-lift >> [5] https://ghc.haskell.org/trac/ghc/ticket/10030 >> [6] https://ghc.haskell.org/trac/ghc/ticket/1830#comment:11 >> _______________________________________________ >> ghc-devs mailing list >> ghc-devs at haskell.org >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From ekmett at gmail.com Mon Sep 7 20:13:59 2015 From: ekmett at gmail.com (Edward Kmett) Date: Mon, 7 Sep 2015 16:13:59 -0400 Subject: ArrayArrays In-Reply-To: <325b043066bb48a79f254b75ba9753ee@DB4PR30MB030.064d.mgd.msft.net> References: <4DACFC45-0E7E-4B3F-8435-5365EC3F7749@cse.unsw.edu.au> <65158505c7be41afad85374d246b7350@DB4PR30MB030.064d.mgd.msft.net> <2FCB6298-A4FF-4F7B-8BF8-4880BB3154AB@gmail.com> <325b043066bb48a79f254b75ba9753ee@DB4PR30MB030.064d.mgd.msft.net> Message-ID: I volunteered to write something up with the caveat that it would take me a while after the conference ended to get time to do so. I'll see what I can do. -Edward On Mon, Sep 7, 2015 at 9:59 AM, Simon Peyton Jones wrote: > It was fun to meet and discuss this. > > > > Did someone volunteer to write a wiki page that describes the proposed > design? And, I earnestly hope, also describes the menagerie of currently > available array types and primops so that users can have some chance of > picking the right one?! > > > > Thanks > > > > Simon > > > > *From:* ghc-devs [mailto:ghc-devs-bounces at haskell.org] *On Behalf Of *Ryan > Newton > *Sent:* 31 August 2015 23:11 > *To:* Edward Kmett; Johan Tibell > *Cc:* Simon Marlow; Manuel M T Chakravarty; Chao-Hong Chen; ghc-devs; > Ryan Scott; Ryan Yates > *Subject:* Re: ArrayArrays > > > > Dear Edward, Ryan Yates, and other interested parties -- > > > > So when should we meet up about this? > > > > May I propose the Tues afternoon break for everyone at ICFP who is > interested in this topic? We can meet out in the coffee area and > congregate around Edward Kmett, who is tall and should be easy to find ;-). > > > > I think Ryan is going to show us how to use his new primops for combined > array + other fields in one heap object? > > > > On Sat, Aug 29, 2015 at 9:24 PM Edward Kmett wrote: > > Without a custom primitive it doesn't help much there, you have to store > the indirection to the mask. > > > > With a custom primitive it should cut the on heap root-to-leaf path of > everything in the HAMT in half. A shorter HashMap was actually one of the > motivating factors for me doing this. It is rather astoundingly difficult > to beat the performance of HashMap, so I had to start cheating pretty > badly. ;) > > > > -Edward > > > > On Sat, Aug 29, 2015 at 5:45 PM, Johan Tibell > wrote: > > I'd also be interested to chat at ICFP to see if I can use this for my > HAMT implementation. > > > > On Sat, Aug 29, 2015 at 3:07 PM, Edward Kmett wrote: > > Sounds good to me. Right now I'm just hacking up composable accessors for > "typed slots" in a fairly lens-like fashion, and treating the set of slots > I define and the 'new' function I build for the data type as its API, and > build atop that. This could eventually graduate to template-haskell, but > I'm not entirely satisfied with the solution I have. I currently > distinguish between what I'm calling "slots" (things that point directly to > another SmallMutableArrayArray# sans wrapper) and "fields" which point > directly to the usual Haskell data types because unifying the two notions > meant that I couldn't lift some coercions out "far enough" to make them > vanish. > > > > I'll be happy to run through my current working set of issues in person > and -- as things get nailed down further -- in a longer lived medium than > in personal conversations. ;) > > > > -Edward > > > > On Sat, Aug 29, 2015 at 7:59 AM, Ryan Newton wrote: > > I'd also love to meet up at ICFP and discuss this. I think the array > primops plus a TH layer that lets (ab)use them many times without too much > marginal cost sounds great. And I'd like to learn how we could be either > early users of, or help with, this infrastructure. > > > > CC'ing in Ryan Scot and Omer Agacan who may also be interested in dropping > in on such discussions @ICFP, and Chao-Hong Chen, a Ph.D. student who is > currently working on concurrent data structures in Haskell, but will not be > at ICFP. > > > > > > On Fri, Aug 28, 2015 at 7:47 PM, Ryan Yates wrote: > > I completely agree. I would love to spend some time during ICFP and > friends talking about what it could look like. My small array for STM > changes for the RTS can be seen here [1]. It is on a branch somewhere > between 7.8 and 7.10 and includes irrelevant STM bits and some > confusing naming choices (sorry), but should cover all the details > needed to implement it for a non-STM context. The biggest surprise > for me was following small array too closely and having a word/byte > offset miss-match [2]. > > [1]: > https://github.com/fryguybob/ghc/compare/ghc-htm-bloom...fryguybob:ghc-htm-mut > [2]: https://ghc.haskell.org/trac/ghc/ticket/10413 > > Ryan > > > On Fri, Aug 28, 2015 at 10:09 PM, Edward Kmett wrote: > > I'd love to have that last 10%, but its a lot of work to get there and > more > > importantly I don't know quite what it should look like. > > > > On the other hand, I do have a pretty good idea of how the primitives > above > > could be banged out and tested in a long evening, well in time for 7.12. > And > > as noted earlier, those remain useful even if a nicer typed version with > an > > extra level of indirection to the sizes is built up after. > > > > The rest sounds like a good graduate student project for someone who has > > graduate students lying around. Maybe somebody at Indiana University who > has > > an interest in type theory and parallelism can find us one. =) > > > > -Edward > > > > On Fri, Aug 28, 2015 at 8:48 PM, Ryan Yates wrote: > >> > >> I think from my perspective, the motivation for getting the type > >> checker involved is primarily bringing this to the level where users > >> could be expected to build these structures. it is reasonable to > >> think that there are people who want to use STM (a context with > >> mutation already) to implement a straight forward data structure that > >> avoids extra indirection penalty. There should be some places where > >> knowing that things are field accesses rather then array indexing > >> could be helpful, but I think GHC is good right now about handling > >> constant offsets. In my code I don't do any bounds checking as I know > >> I will only be accessing my arrays with constant indexes. I make > >> wrappers for each field access and leave all the unsafe stuff in > >> there. When things go wrong though, the compiler is no help. Maybe > >> template Haskell that generates the appropriate wrappers is the right > >> direction to go. > >> There is another benefit for me when working with these as arrays in > >> that it is quite simple and direct (given the hoops already jumped > >> through) to play with alignment. I can ensure two pointers are never > >> on the same cache-line by just spacing things out in the array. > >> > >> On Fri, Aug 28, 2015 at 7:33 PM, Edward Kmett wrote: > >> > They just segfault at this level. ;) > >> > > >> > Sent from my iPhone > >> > > >> > On Aug 28, 2015, at 7:25 PM, Ryan Newton wrote: > >> > > >> > You presumably also save a bounds check on reads by hard-coding the > >> > sizes? > >> > > >> > On Fri, Aug 28, 2015 at 3:39 PM, Edward Kmett > wrote: > >> >> > >> >> Also there are 4 different "things" here, basically depending on two > >> >> independent questions: > >> >> > >> >> a.) if you want to shove the sizes into the info table, and > >> >> b.) if you want cardmarking. > >> >> > >> >> Versions with/without cardmarking for different sizes can be done > >> >> pretty > >> >> easily, but as noted, the infotable variants are pretty invasive. > >> >> > >> >> -Edward > >> >> > >> >> On Fri, Aug 28, 2015 at 6:36 PM, Edward Kmett > wrote: > >> >>> > >> >>> Well, on the plus side you'd save 16 bytes per object, which adds up > >> >>> if > >> >>> they were small enough and there are enough of them. You get a bit > >> >>> better > >> >>> locality of reference in terms of what fits in the first cache line > of > >> >>> them. > >> >>> > >> >>> -Edward > >> >>> > >> >>> On Fri, Aug 28, 2015 at 6:14 PM, Ryan Newton > >> >>> wrote: > >> >>>> > >> >>>> Yes. And for the short term I can imagine places we will settle > with > >> >>>> arrays even if it means tracking lengths unnecessarily and > >> >>>> unsafeCoercing > >> >>>> pointers whose types don't actually match their siblings. > >> >>>> > >> >>>> Is there anything to recommend the hacks mentioned for fixed sized > >> >>>> array > >> >>>> objects *other* than using them to fake structs? (Much to > >> >>>> derecommend, as > >> >>>> you mentioned!) > >> >>>> > >> >>>> On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett > >> >>>> wrote: > >> >>>>> > >> >>>>> I think both are useful, but the one you suggest requires a lot > more > >> >>>>> plumbing and doesn't subsume all of the usecases of the other. > >> >>>>> > >> >>>>> -Edward > >> >>>>> > >> >>>>> On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton > >> >>>>> wrote: > >> >>>>>> > >> >>>>>> So that primitive is an array like thing (Same pointed type, > >> >>>>>> unbounded > >> >>>>>> length) with extra payload. > >> >>>>>> > >> >>>>>> I can see how we can do without structs if we have arrays, > >> >>>>>> especially > >> >>>>>> with the extra payload at front. But wouldn't the general > solution > >> >>>>>> for > >> >>>>>> structs be one that that allows new user data type defs for # > >> >>>>>> types? > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett > >> >>>>>> wrote: > >> >>>>>>> > >> >>>>>>> Some form of MutableStruct# with a known number of words and a > >> >>>>>>> known > >> >>>>>>> number of pointers is basically what Ryan Yates was suggesting > >> >>>>>>> above, but > >> >>>>>>> where the word counts were stored in the objects themselves. > >> >>>>>>> > >> >>>>>>> Given that it'd have a couple of words for those counts it'd > >> >>>>>>> likely > >> >>>>>>> want to be something we build in addition to MutVar# rather > than a > >> >>>>>>> replacement. > >> >>>>>>> > >> >>>>>>> On the other hand, if we had to fix those numbers and build info > >> >>>>>>> tables that knew them, and typechecker support, for instance, > it'd > >> >>>>>>> get > >> >>>>>>> rather invasive. > >> >>>>>>> > >> >>>>>>> Also, a number of things that we can do with the 'sized' > versions > >> >>>>>>> above, like working with evil unsized c-style arrays directly > >> >>>>>>> inline at the > >> >>>>>>> end of the structure cease to be possible, so it isn't even a > pure > >> >>>>>>> win if we > >> >>>>>>> did the engineering effort. > >> >>>>>>> > >> >>>>>>> I think 90% of the needs I have are covered just by adding the > one > >> >>>>>>> primitive. The last 10% gets pretty invasive. > >> >>>>>>> > >> >>>>>>> -Edward > >> >>>>>>> > >> >>>>>>> On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton < > rrnewton at gmail.com> > >> >>>>>>> wrote: > >> >>>>>>>> > >> >>>>>>>> I like the possibility of a general solution for mutable > structs > >> >>>>>>>> (like Ed said), and I'm trying to fully understand why it's > hard. > >> >>>>>>>> > >> >>>>>>>> So, we can't unpack MutVar into constructors because of object > >> >>>>>>>> identity problems. But what about directly supporting an > >> >>>>>>>> extensible set of > >> >>>>>>>> unlifted MutStruct# objects, generalizing (and even replacing) > >> >>>>>>>> MutVar#? That > >> >>>>>>>> may be too much work, but is it problematic otherwise? > >> >>>>>>>> > >> >>>>>>>> Needless to say, this is also critical if we ever want best in > >> >>>>>>>> class > >> >>>>>>>> lockfree mutable structures, just like their Stm and sequential > >> >>>>>>>> counterparts. > >> >>>>>>>> > >> >>>>>>>> On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones > >> >>>>>>>> wrote: > >> >>>>>>>>> > >> >>>>>>>>> At the very least I'll take this email and turn it into a > short > >> >>>>>>>>> article. > >> >>>>>>>>> > >> >>>>>>>>> Yes, please do make it into a wiki page on the GHC Trac, and > >> >>>>>>>>> maybe > >> >>>>>>>>> make a ticket for it. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Thanks > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Simon > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> From: Edward Kmett [mailto:ekmett at gmail.com] > >> >>>>>>>>> Sent: 27 August 2015 16:54 > >> >>>>>>>>> To: Simon Peyton Jones > >> >>>>>>>>> Cc: Manuel M T Chakravarty; Simon Marlow; ghc-devs > >> >>>>>>>>> Subject: Re: ArrayArrays > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> An ArrayArray# is just an Array# with a modified invariant. It > >> >>>>>>>>> points directly to other unlifted ArrayArray#'s or > ByteArray#'s. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> While those live in #, they are garbage collected objects, so > >> >>>>>>>>> this > >> >>>>>>>>> all lives on the heap. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> They were added to make some of the DPH stuff fast when it has > >> >>>>>>>>> to > >> >>>>>>>>> deal with nested arrays. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I'm currently abusing them as a placeholder for a better > thing. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> The Problem > >> >>>>>>>>> > >> >>>>>>>>> ----------------- > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Consider the scenario where you write a classic doubly-linked > >> >>>>>>>>> list > >> >>>>>>>>> in Haskell. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Chasing from one DLL to the next requires following 3 pointers > >> >>>>>>>>> on > >> >>>>>>>>> the heap. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> DLL ~> IORef (Maybe DLL) ~> MutVar# RealWorld (Maybe DLL) ~> > >> >>>>>>>>> Maybe > >> >>>>>>>>> DLL ~> DLL > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> That is 3 levels of indirection. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> We can trim one by simply unpacking the IORef with > >> >>>>>>>>> -funbox-strict-fields or UNPACK > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> We can trim another by adding a 'Nil' constructor for DLL and > >> >>>>>>>>> worsening our representation. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> but now we're still stuck with a level of indirection > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> DLL ~> MutVar# RealWorld DLL ~> DLL > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> This means that every operation we perform on this structure > >> >>>>>>>>> will > >> >>>>>>>>> be about half of the speed of an implementation in most other > >> >>>>>>>>> languages > >> >>>>>>>>> assuming we're memory bound on loading things into cache! > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Making Progress > >> >>>>>>>>> > >> >>>>>>>>> ---------------------- > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I have been working on a number of data structures where the > >> >>>>>>>>> indirection of going from something in * out to an object in # > >> >>>>>>>>> which > >> >>>>>>>>> contains the real pointer to my target and coming back > >> >>>>>>>>> effectively doubles > >> >>>>>>>>> my runtime. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> We go out to the MutVar# because we are allowed to put the > >> >>>>>>>>> MutVar# > >> >>>>>>>>> onto the mutable list when we dirty it. There is a well > defined > >> >>>>>>>>> write-barrier. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I could change out the representation to use > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL (MutableArray# RealWorld DLL) | Nil > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I can just store two pointers in the MutableArray# every time, > >> >>>>>>>>> but > >> >>>>>>>>> this doesn't help _much_ directly. It has reduced the amount > of > >> >>>>>>>>> distinct > >> >>>>>>>>> addresses in memory I touch on a walk of the DLL from 3 per > >> >>>>>>>>> object to 2. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I still have to go out to the heap from my DLL and get to the > >> >>>>>>>>> array > >> >>>>>>>>> object and then chase it to the next DLL and chase that to the > >> >>>>>>>>> next array. I > >> >>>>>>>>> do get my two pointers together in memory though. I'm paying > for > >> >>>>>>>>> a card > >> >>>>>>>>> marking table as well, which I don't particularly need with > just > >> >>>>>>>>> two > >> >>>>>>>>> pointers, but we can shed that with the "SmallMutableArray#" > >> >>>>>>>>> machinery added > >> >>>>>>>>> back in 7.10, which is just the old array code a a new data > >> >>>>>>>>> type, which can > >> >>>>>>>>> speed things up a bit when you don't have very big arrays: > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> But what if I wanted my object itself to live in # and have > two > >> >>>>>>>>> mutable fields and be able to share the sme write barrier? > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> An ArrayArray# points directly to other unlifted array types. > >> >>>>>>>>> What > >> >>>>>>>>> if we have one # -> * wrapper on the outside to deal with the > >> >>>>>>>>> impedence > >> >>>>>>>>> mismatch between the imperative world and Haskell, and then > just > >> >>>>>>>>> let the > >> >>>>>>>>> ArrayArray#'s hold other arrayarrays. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL (MutableArrayArray# RealWorld) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> now I need to make up a new Nil, which I can just make be a > >> >>>>>>>>> special > >> >>>>>>>>> MutableArrayArray# I allocate on program startup. I can even > >> >>>>>>>>> abuse pattern > >> >>>>>>>>> synonyms. Alternately I can exploit the internals further to > >> >>>>>>>>> make this > >> >>>>>>>>> cheaper. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Then I can use the readMutableArrayArray# and > >> >>>>>>>>> writeMutableArrayArray# calls to directly access the preceding > >> >>>>>>>>> and next > >> >>>>>>>>> entry in the linked list. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> So now we have one DLL wrapper which just 'bootstraps me' > into a > >> >>>>>>>>> strict world, and everything there lives in #. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> next :: DLL -> IO DLL > >> >>>>>>>>> > >> >>>>>>>>> next (DLL m) = IO $ \s -> case readMutableArrayArray# s of > >> >>>>>>>>> > >> >>>>>>>>> (# s', n #) -> (# s', DLL n #) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> It turns out GHC is quite happy to optimize all of that code > to > >> >>>>>>>>> keep things unboxed. The 'DLL' wrappers get removed pretty > >> >>>>>>>>> easily when they > >> >>>>>>>>> are known strict and you chain operations of this sort! > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Cleaning it Up > >> >>>>>>>>> > >> >>>>>>>>> ------------------ > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Now I have one outermost indirection pointing to an array that > >> >>>>>>>>> points directly to other arrays. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I'm stuck paying for a card marking table per object, but I > can > >> >>>>>>>>> fix > >> >>>>>>>>> that by duplicating the code for MutableArrayArray# and using > a > >> >>>>>>>>> SmallMutableArray#. I can hack up primops that let me store a > >> >>>>>>>>> mixture of > >> >>>>>>>>> SmallMutableArray# fields and normal ones in the data > structure. > >> >>>>>>>>> Operationally, I can even do so by just unsafeCoercing the > >> >>>>>>>>> existing > >> >>>>>>>>> SmallMutableArray# primitives to change the kind of one of the > >> >>>>>>>>> arguments it > >> >>>>>>>>> takes. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> This is almost ideal, but not quite. I often have fields that > >> >>>>>>>>> would > >> >>>>>>>>> be best left unboxed. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLLInt = DLL !Int !(IORef DLL) !(IORef DLL) | Nil > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> was able to unpack the Int, but we lost that. We can currently > >> >>>>>>>>> at > >> >>>>>>>>> best point one of the entries of the SmallMutableArray# at a > >> >>>>>>>>> boxed or at a > >> >>>>>>>>> MutableByteArray# for all of our misc. data and shove the int > in > >> >>>>>>>>> question in > >> >>>>>>>>> there. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> e.g. if I were to implement a hash-array-mapped-trie I need to > >> >>>>>>>>> store masks and administrivia as I walk down the tree. Having > to > >> >>>>>>>>> go off to > >> >>>>>>>>> the side costs me the entire win from avoiding the first > pointer > >> >>>>>>>>> chase. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> But, if like Ryan suggested, we had a heap object we could > >> >>>>>>>>> construct that had n words with unsafe access and m pointers > to > >> >>>>>>>>> other heap > >> >>>>>>>>> objects, one that could put itself on the mutable list when > any > >> >>>>>>>>> of those > >> >>>>>>>>> pointers changed then I could shed this last factor of two in > >> >>>>>>>>> all > >> >>>>>>>>> circumstances. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Prototype > >> >>>>>>>>> > >> >>>>>>>>> ------------- > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Over the last few days I've put together a small prototype > >> >>>>>>>>> implementation with a few non-trivial imperative data > structures > >> >>>>>>>>> for things > >> >>>>>>>>> like Tarjan's link-cut trees, the list labeling problem and > >> >>>>>>>>> order-maintenance. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> https://github.com/ekmett/structs > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Notable bits: > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Data.Struct.Internal.LinkCut provides an implementation of > >> >>>>>>>>> link-cut > >> >>>>>>>>> trees in this style. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Data.Struct.Internal provides the rather horrifying guts that > >> >>>>>>>>> make > >> >>>>>>>>> it go fast. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Once compiled with -O or -O2, if you look at the core, almost > >> >>>>>>>>> all > >> >>>>>>>>> the references to the LinkCut or Object data constructor get > >> >>>>>>>>> optimized away, > >> >>>>>>>>> and we're left with beautiful strict code directly mutating > out > >> >>>>>>>>> underlying > >> >>>>>>>>> representation. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> At the very least I'll take this email and turn it into a > short > >> >>>>>>>>> article. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> -Edward > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> On Thu, Aug 27, 2015 at 9:00 AM, Simon Peyton Jones > >> >>>>>>>>> wrote: > >> >>>>>>>>> > >> >>>>>>>>> Just to say that I have no idea what is going on in this > thread. > >> >>>>>>>>> What is ArrayArray? What is the issue in general? Is there a > >> >>>>>>>>> ticket? Is > >> >>>>>>>>> there a wiki page? > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> If it?s important, an ab-initio wiki page + ticket would be a > >> >>>>>>>>> good > >> >>>>>>>>> thing. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Simon > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On > Behalf > >> >>>>>>>>> Of > >> >>>>>>>>> Edward Kmett > >> >>>>>>>>> Sent: 21 August 2015 05:25 > >> >>>>>>>>> To: Manuel M T Chakravarty > >> >>>>>>>>> Cc: Simon Marlow; ghc-devs > >> >>>>>>>>> Subject: Re: ArrayArrays > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> When (ab)using them for this purpose, SmallArrayArray's would > be > >> >>>>>>>>> very handy as well. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Consider right now if I have something like an > order-maintenance > >> >>>>>>>>> structure I have: > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data Upper s = Upper {-# UNPACK #-} !(MutableByteArray s) {-# > >> >>>>>>>>> UNPACK #-} !(MutVar s (Upper s)) {-# UNPACK #-} !(MutVar s > >> >>>>>>>>> (Upper s)) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data Lower s = Lower {-# UNPACK #-} !(MutVar s (Upper s)) {-# > >> >>>>>>>>> UNPACK #-} !(MutableByteArray s) {-# UNPACK #-} !(MutVar s > >> >>>>>>>>> (Lower s)) {-# > >> >>>>>>>>> UNPACK #-} !(MutVar s (Lower s)) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> The former contains, logically, a mutable integer and two > >> >>>>>>>>> pointers, > >> >>>>>>>>> one for forward and one for backwards. The latter is basically > >> >>>>>>>>> the same > >> >>>>>>>>> thing with a mutable reference up pointing at the structure > >> >>>>>>>>> above. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> On the heap this is an object that points to a structure for > the > >> >>>>>>>>> bytearray, and points to another structure for each mutvar > which > >> >>>>>>>>> each point > >> >>>>>>>>> to the other 'Upper' structure. So there is a level of > >> >>>>>>>>> indirection smeared > >> >>>>>>>>> over everything. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> So this is a pair of doubly linked lists with an upward link > >> >>>>>>>>> from > >> >>>>>>>>> the structure below to the structure above. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Converted into ArrayArray#s I'd get > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data Upper s = Upper (MutableArrayArray# s) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> w/ the first slot being a pointer to a MutableByteArray#, and > >> >>>>>>>>> the > >> >>>>>>>>> next 2 slots pointing to the previous and next previous > objects, > >> >>>>>>>>> represented > >> >>>>>>>>> just as their MutableArrayArray#s. I can use > >> >>>>>>>>> sameMutableArrayArray# on these > >> >>>>>>>>> for object identity, which lets me check for the ends of the > >> >>>>>>>>> lists by tying > >> >>>>>>>>> things back on themselves. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> and below that > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data Lower s = Lower (MutableArrayArray# s) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> is similar, with an extra MutableArrayArray slot pointing up > to > >> >>>>>>>>> an > >> >>>>>>>>> upper structure. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I can then write a handful of combinators for getting out the > >> >>>>>>>>> slots > >> >>>>>>>>> in question, while it has gained a level of indirection > between > >> >>>>>>>>> the wrapper > >> >>>>>>>>> to put it in * and the MutableArrayArray# s in #, that one can > >> >>>>>>>>> be basically > >> >>>>>>>>> erased by ghc. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Unlike before I don't have several separate objects on the > heap > >> >>>>>>>>> for > >> >>>>>>>>> each thing. I only have 2 now. The MutableArrayArray# for the > >> >>>>>>>>> object itself, > >> >>>>>>>>> and the MutableByteArray# that it references to carry around > the > >> >>>>>>>>> mutable > >> >>>>>>>>> int. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> The only pain points are > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> 1.) the aforementioned limitation that currently prevents me > >> >>>>>>>>> from > >> >>>>>>>>> stuffing normal boxed data through a SmallArray or Array into > an > >> >>>>>>>>> ArrayArray > >> >>>>>>>>> leaving me in a little ghetto disconnected from the rest of > >> >>>>>>>>> Haskell, > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> and > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> 2.) the lack of SmallArrayArray's, which could let us avoid > the > >> >>>>>>>>> card marking overhead. These objects are all small, 3-4 > pointers > >> >>>>>>>>> wide. Card > >> >>>>>>>>> marking doesn't help. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Alternately I could just try to do really evil things and > >> >>>>>>>>> convert > >> >>>>>>>>> the whole mess to SmallArrays and then figure out how to > >> >>>>>>>>> unsafeCoerce my way > >> >>>>>>>>> to glory, stuffing the #'d references to the other arrays > >> >>>>>>>>> directly into the > >> >>>>>>>>> SmallArray as slots, removing the limitation we see here by > >> >>>>>>>>> aping the > >> >>>>>>>>> MutableArrayArray# s API, but that gets really really > dangerous! > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I'm pretty much willing to sacrifice almost anything on the > >> >>>>>>>>> altar > >> >>>>>>>>> of speed here, but I'd like to be able to let the GC move them > >> >>>>>>>>> and collect > >> >>>>>>>>> them which rules out simpler Ptr and Addr based solutions. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> -Edward > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> On Thu, Aug 20, 2015 at 9:01 PM, Manuel M T Chakravarty > >> >>>>>>>>> wrote: > >> >>>>>>>>> > >> >>>>>>>>> That?s an interesting idea. > >> >>>>>>>>> > >> >>>>>>>>> Manuel > >> >>>>>>>>> > >> >>>>>>>>> > Edward Kmett : > >> >>>>>>>>> > >> >>>>>>>>> > > >> >>>>>>>>> > Would it be possible to add unsafe primops to add Array# and > >> >>>>>>>>> > SmallArray# entries to an ArrayArray#? The fact that the > >> >>>>>>>>> > ArrayArray# entries > >> >>>>>>>>> > are all directly unlifted avoiding a level of indirection > for > >> >>>>>>>>> > the containing > >> >>>>>>>>> > structure is amazing, but I can only currently use it if my > >> >>>>>>>>> > leaf level data > >> >>>>>>>>> > can be 100% unboxed and distributed among ByteArray#s. It'd > be > >> >>>>>>>>> > nice to be > >> >>>>>>>>> > able to have the ability to put SmallArray# a stuff down at > >> >>>>>>>>> > the leaves to > >> >>>>>>>>> > hold lifted contents. > >> >>>>>>>>> > > >> >>>>>>>>> > I accept fully that if I name the wrong type when I go to > >> >>>>>>>>> > access > >> >>>>>>>>> > one of the fields it'll lie to me, but I suppose it'd do > that > >> >>>>>>>>> > if i tried to > >> >>>>>>>>> > use one of the members that held a nested ArrayArray# as a > >> >>>>>>>>> > ByteArray# > >> >>>>>>>>> > anyways, so it isn't like there is a safety story preventing > >> >>>>>>>>> > this. > >> >>>>>>>>> > > >> >>>>>>>>> > I've been hunting for ways to try to kill the indirection > >> >>>>>>>>> > problems I get with Haskell and mutable structures, and I > >> >>>>>>>>> > could shoehorn a > >> >>>>>>>>> > number of them into ArrayArrays if this worked. > >> >>>>>>>>> > > >> >>>>>>>>> > Right now I'm stuck paying for 2 or 3 levels of unnecessary > >> >>>>>>>>> > indirection compared to c/java and this could reduce that > pain > >> >>>>>>>>> > to just 1 > >> >>>>>>>>> > level of unnecessary indirection. > >> >>>>>>>>> > > >> >>>>>>>>> > -Edward > >> >>>>>>>>> > >> >>>>>>>>> > _______________________________________________ > >> >>>>>>>>> > ghc-devs mailing list > >> >>>>>>>>> > ghc-devs at haskell.org > >> >>>>>>>>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> _______________________________________________ > >> >>>>>>>>> ghc-devs mailing list > >> >>>>>>>>> ghc-devs at haskell.org > >> >>>>>>>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > >> >>>>>>> > >> >>>>>>> > >> >>>>> > >> >>> > >> >> > >> > > >> > > >> > _______________________________________________ > >> > ghc-devs mailing list > >> > ghc-devs at haskell.org > >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > >> > > > > > > > > > > > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ekmett at gmail.com Mon Sep 7 20:16:57 2015 From: ekmett at gmail.com (Edward Kmett) Date: Mon, 7 Sep 2015 16:16:57 -0400 Subject: ArrayArrays In-Reply-To: <325b043066bb48a79f254b75ba9753ee@DB4PR30MB030.064d.mgd.msft.net> References: <4DACFC45-0E7E-4B3F-8435-5365EC3F7749@cse.unsw.edu.au> <65158505c7be41afad85374d246b7350@DB4PR30MB030.064d.mgd.msft.net> <2FCB6298-A4FF-4F7B-8BF8-4880BB3154AB@gmail.com> <325b043066bb48a79f254b75ba9753ee@DB4PR30MB030.064d.mgd.msft.net> Message-ID: I had a brief discussion with Richard during the Haskell Symposium about how we might be able to let parametricity help a bit in reducing the space of necessarily primops to a slightly more manageable level. Notably, it'd be interesting to explore the ability to allow parametricity over the portion of # that is just a gcptr. We could do this if the levity polymorphism machinery was tweaked a bit. You could envision the ability to abstract over things in both * and the subset of # that are represented by a gcptr, then modifying the existing array primitives to be parametric in that choice of levity for their argument so long as it was of a "heap object" levity. This could make the menagerie of ways to pack {Small}{Mutable}Array{Array}# references into a {Small}{Mutable}Array{Array}#' actually typecheck soundly, reducing the need for folks to descend into the use of the more evil structure primitives we're talking about, and letting us keep a few more principles around us. Then in the cases like `atomicModifyMutVar#` where it needs to actually be in * rather than just a gcptr, due to the constructed field selectors it introduces on the heap then we could keep the existing less polymorphic type. -Edward On Mon, Sep 7, 2015 at 9:59 AM, Simon Peyton Jones wrote: > It was fun to meet and discuss this. > > > > Did someone volunteer to write a wiki page that describes the proposed > design? And, I earnestly hope, also describes the menagerie of currently > available array types and primops so that users can have some chance of > picking the right one?! > > > > Thanks > > > > Simon > > > > *From:* ghc-devs [mailto:ghc-devs-bounces at haskell.org] *On Behalf Of *Ryan > Newton > *Sent:* 31 August 2015 23:11 > *To:* Edward Kmett; Johan Tibell > *Cc:* Simon Marlow; Manuel M T Chakravarty; Chao-Hong Chen; ghc-devs; > Ryan Scott; Ryan Yates > *Subject:* Re: ArrayArrays > > > > Dear Edward, Ryan Yates, and other interested parties -- > > > > So when should we meet up about this? > > > > May I propose the Tues afternoon break for everyone at ICFP who is > interested in this topic? We can meet out in the coffee area and > congregate around Edward Kmett, who is tall and should be easy to find ;-). > > > > I think Ryan is going to show us how to use his new primops for combined > array + other fields in one heap object? > > > > On Sat, Aug 29, 2015 at 9:24 PM Edward Kmett wrote: > > Without a custom primitive it doesn't help much there, you have to store > the indirection to the mask. > > > > With a custom primitive it should cut the on heap root-to-leaf path of > everything in the HAMT in half. A shorter HashMap was actually one of the > motivating factors for me doing this. It is rather astoundingly difficult > to beat the performance of HashMap, so I had to start cheating pretty > badly. ;) > > > > -Edward > > > > On Sat, Aug 29, 2015 at 5:45 PM, Johan Tibell > wrote: > > I'd also be interested to chat at ICFP to see if I can use this for my > HAMT implementation. > > > > On Sat, Aug 29, 2015 at 3:07 PM, Edward Kmett wrote: > > Sounds good to me. Right now I'm just hacking up composable accessors for > "typed slots" in a fairly lens-like fashion, and treating the set of slots > I define and the 'new' function I build for the data type as its API, and > build atop that. This could eventually graduate to template-haskell, but > I'm not entirely satisfied with the solution I have. I currently > distinguish between what I'm calling "slots" (things that point directly to > another SmallMutableArrayArray# sans wrapper) and "fields" which point > directly to the usual Haskell data types because unifying the two notions > meant that I couldn't lift some coercions out "far enough" to make them > vanish. > > > > I'll be happy to run through my current working set of issues in person > and -- as things get nailed down further -- in a longer lived medium than > in personal conversations. ;) > > > > -Edward > > > > On Sat, Aug 29, 2015 at 7:59 AM, Ryan Newton wrote: > > I'd also love to meet up at ICFP and discuss this. I think the array > primops plus a TH layer that lets (ab)use them many times without too much > marginal cost sounds great. And I'd like to learn how we could be either > early users of, or help with, this infrastructure. > > > > CC'ing in Ryan Scot and Omer Agacan who may also be interested in dropping > in on such discussions @ICFP, and Chao-Hong Chen, a Ph.D. student who is > currently working on concurrent data structures in Haskell, but will not be > at ICFP. > > > > > > On Fri, Aug 28, 2015 at 7:47 PM, Ryan Yates wrote: > > I completely agree. I would love to spend some time during ICFP and > friends talking about what it could look like. My small array for STM > changes for the RTS can be seen here [1]. It is on a branch somewhere > between 7.8 and 7.10 and includes irrelevant STM bits and some > confusing naming choices (sorry), but should cover all the details > needed to implement it for a non-STM context. The biggest surprise > for me was following small array too closely and having a word/byte > offset miss-match [2]. > > [1]: > https://github.com/fryguybob/ghc/compare/ghc-htm-bloom...fryguybob:ghc-htm-mut > [2]: https://ghc.haskell.org/trac/ghc/ticket/10413 > > Ryan > > > On Fri, Aug 28, 2015 at 10:09 PM, Edward Kmett wrote: > > I'd love to have that last 10%, but its a lot of work to get there and > more > > importantly I don't know quite what it should look like. > > > > On the other hand, I do have a pretty good idea of how the primitives > above > > could be banged out and tested in a long evening, well in time for 7.12. > And > > as noted earlier, those remain useful even if a nicer typed version with > an > > extra level of indirection to the sizes is built up after. > > > > The rest sounds like a good graduate student project for someone who has > > graduate students lying around. Maybe somebody at Indiana University who > has > > an interest in type theory and parallelism can find us one. =) > > > > -Edward > > > > On Fri, Aug 28, 2015 at 8:48 PM, Ryan Yates wrote: > >> > >> I think from my perspective, the motivation for getting the type > >> checker involved is primarily bringing this to the level where users > >> could be expected to build these structures. it is reasonable to > >> think that there are people who want to use STM (a context with > >> mutation already) to implement a straight forward data structure that > >> avoids extra indirection penalty. There should be some places where > >> knowing that things are field accesses rather then array indexing > >> could be helpful, but I think GHC is good right now about handling > >> constant offsets. In my code I don't do any bounds checking as I know > >> I will only be accessing my arrays with constant indexes. I make > >> wrappers for each field access and leave all the unsafe stuff in > >> there. When things go wrong though, the compiler is no help. Maybe > >> template Haskell that generates the appropriate wrappers is the right > >> direction to go. > >> There is another benefit for me when working with these as arrays in > >> that it is quite simple and direct (given the hoops already jumped > >> through) to play with alignment. I can ensure two pointers are never > >> on the same cache-line by just spacing things out in the array. > >> > >> On Fri, Aug 28, 2015 at 7:33 PM, Edward Kmett wrote: > >> > They just segfault at this level. ;) > >> > > >> > Sent from my iPhone > >> > > >> > On Aug 28, 2015, at 7:25 PM, Ryan Newton wrote: > >> > > >> > You presumably also save a bounds check on reads by hard-coding the > >> > sizes? > >> > > >> > On Fri, Aug 28, 2015 at 3:39 PM, Edward Kmett > wrote: > >> >> > >> >> Also there are 4 different "things" here, basically depending on two > >> >> independent questions: > >> >> > >> >> a.) if you want to shove the sizes into the info table, and > >> >> b.) if you want cardmarking. > >> >> > >> >> Versions with/without cardmarking for different sizes can be done > >> >> pretty > >> >> easily, but as noted, the infotable variants are pretty invasive. > >> >> > >> >> -Edward > >> >> > >> >> On Fri, Aug 28, 2015 at 6:36 PM, Edward Kmett > wrote: > >> >>> > >> >>> Well, on the plus side you'd save 16 bytes per object, which adds up > >> >>> if > >> >>> they were small enough and there are enough of them. You get a bit > >> >>> better > >> >>> locality of reference in terms of what fits in the first cache line > of > >> >>> them. > >> >>> > >> >>> -Edward > >> >>> > >> >>> On Fri, Aug 28, 2015 at 6:14 PM, Ryan Newton > >> >>> wrote: > >> >>>> > >> >>>> Yes. And for the short term I can imagine places we will settle > with > >> >>>> arrays even if it means tracking lengths unnecessarily and > >> >>>> unsafeCoercing > >> >>>> pointers whose types don't actually match their siblings. > >> >>>> > >> >>>> Is there anything to recommend the hacks mentioned for fixed sized > >> >>>> array > >> >>>> objects *other* than using them to fake structs? (Much to > >> >>>> derecommend, as > >> >>>> you mentioned!) > >> >>>> > >> >>>> On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett > >> >>>> wrote: > >> >>>>> > >> >>>>> I think both are useful, but the one you suggest requires a lot > more > >> >>>>> plumbing and doesn't subsume all of the usecases of the other. > >> >>>>> > >> >>>>> -Edward > >> >>>>> > >> >>>>> On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton > >> >>>>> wrote: > >> >>>>>> > >> >>>>>> So that primitive is an array like thing (Same pointed type, > >> >>>>>> unbounded > >> >>>>>> length) with extra payload. > >> >>>>>> > >> >>>>>> I can see how we can do without structs if we have arrays, > >> >>>>>> especially > >> >>>>>> with the extra payload at front. But wouldn't the general > solution > >> >>>>>> for > >> >>>>>> structs be one that that allows new user data type defs for # > >> >>>>>> types? > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett > >> >>>>>> wrote: > >> >>>>>>> > >> >>>>>>> Some form of MutableStruct# with a known number of words and a > >> >>>>>>> known > >> >>>>>>> number of pointers is basically what Ryan Yates was suggesting > >> >>>>>>> above, but > >> >>>>>>> where the word counts were stored in the objects themselves. > >> >>>>>>> > >> >>>>>>> Given that it'd have a couple of words for those counts it'd > >> >>>>>>> likely > >> >>>>>>> want to be something we build in addition to MutVar# rather > than a > >> >>>>>>> replacement. > >> >>>>>>> > >> >>>>>>> On the other hand, if we had to fix those numbers and build info > >> >>>>>>> tables that knew them, and typechecker support, for instance, > it'd > >> >>>>>>> get > >> >>>>>>> rather invasive. > >> >>>>>>> > >> >>>>>>> Also, a number of things that we can do with the 'sized' > versions > >> >>>>>>> above, like working with evil unsized c-style arrays directly > >> >>>>>>> inline at the > >> >>>>>>> end of the structure cease to be possible, so it isn't even a > pure > >> >>>>>>> win if we > >> >>>>>>> did the engineering effort. > >> >>>>>>> > >> >>>>>>> I think 90% of the needs I have are covered just by adding the > one > >> >>>>>>> primitive. The last 10% gets pretty invasive. > >> >>>>>>> > >> >>>>>>> -Edward > >> >>>>>>> > >> >>>>>>> On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton < > rrnewton at gmail.com> > >> >>>>>>> wrote: > >> >>>>>>>> > >> >>>>>>>> I like the possibility of a general solution for mutable > structs > >> >>>>>>>> (like Ed said), and I'm trying to fully understand why it's > hard. > >> >>>>>>>> > >> >>>>>>>> So, we can't unpack MutVar into constructors because of object > >> >>>>>>>> identity problems. But what about directly supporting an > >> >>>>>>>> extensible set of > >> >>>>>>>> unlifted MutStruct# objects, generalizing (and even replacing) > >> >>>>>>>> MutVar#? That > >> >>>>>>>> may be too much work, but is it problematic otherwise? > >> >>>>>>>> > >> >>>>>>>> Needless to say, this is also critical if we ever want best in > >> >>>>>>>> class > >> >>>>>>>> lockfree mutable structures, just like their Stm and sequential > >> >>>>>>>> counterparts. > >> >>>>>>>> > >> >>>>>>>> On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones > >> >>>>>>>> wrote: > >> >>>>>>>>> > >> >>>>>>>>> At the very least I'll take this email and turn it into a > short > >> >>>>>>>>> article. > >> >>>>>>>>> > >> >>>>>>>>> Yes, please do make it into a wiki page on the GHC Trac, and > >> >>>>>>>>> maybe > >> >>>>>>>>> make a ticket for it. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Thanks > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Simon > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> From: Edward Kmett [mailto:ekmett at gmail.com] > >> >>>>>>>>> Sent: 27 August 2015 16:54 > >> >>>>>>>>> To: Simon Peyton Jones > >> >>>>>>>>> Cc: Manuel M T Chakravarty; Simon Marlow; ghc-devs > >> >>>>>>>>> Subject: Re: ArrayArrays > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> An ArrayArray# is just an Array# with a modified invariant. It > >> >>>>>>>>> points directly to other unlifted ArrayArray#'s or > ByteArray#'s. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> While those live in #, they are garbage collected objects, so > >> >>>>>>>>> this > >> >>>>>>>>> all lives on the heap. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> They were added to make some of the DPH stuff fast when it has > >> >>>>>>>>> to > >> >>>>>>>>> deal with nested arrays. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I'm currently abusing them as a placeholder for a better > thing. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> The Problem > >> >>>>>>>>> > >> >>>>>>>>> ----------------- > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Consider the scenario where you write a classic doubly-linked > >> >>>>>>>>> list > >> >>>>>>>>> in Haskell. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Chasing from one DLL to the next requires following 3 pointers > >> >>>>>>>>> on > >> >>>>>>>>> the heap. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> DLL ~> IORef (Maybe DLL) ~> MutVar# RealWorld (Maybe DLL) ~> > >> >>>>>>>>> Maybe > >> >>>>>>>>> DLL ~> DLL > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> That is 3 levels of indirection. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> We can trim one by simply unpacking the IORef with > >> >>>>>>>>> -funbox-strict-fields or UNPACK > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> We can trim another by adding a 'Nil' constructor for DLL and > >> >>>>>>>>> worsening our representation. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> but now we're still stuck with a level of indirection > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> DLL ~> MutVar# RealWorld DLL ~> DLL > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> This means that every operation we perform on this structure > >> >>>>>>>>> will > >> >>>>>>>>> be about half of the speed of an implementation in most other > >> >>>>>>>>> languages > >> >>>>>>>>> assuming we're memory bound on loading things into cache! > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Making Progress > >> >>>>>>>>> > >> >>>>>>>>> ---------------------- > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I have been working on a number of data structures where the > >> >>>>>>>>> indirection of going from something in * out to an object in # > >> >>>>>>>>> which > >> >>>>>>>>> contains the real pointer to my target and coming back > >> >>>>>>>>> effectively doubles > >> >>>>>>>>> my runtime. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> We go out to the MutVar# because we are allowed to put the > >> >>>>>>>>> MutVar# > >> >>>>>>>>> onto the mutable list when we dirty it. There is a well > defined > >> >>>>>>>>> write-barrier. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I could change out the representation to use > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL (MutableArray# RealWorld DLL) | Nil > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I can just store two pointers in the MutableArray# every time, > >> >>>>>>>>> but > >> >>>>>>>>> this doesn't help _much_ directly. It has reduced the amount > of > >> >>>>>>>>> distinct > >> >>>>>>>>> addresses in memory I touch on a walk of the DLL from 3 per > >> >>>>>>>>> object to 2. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I still have to go out to the heap from my DLL and get to the > >> >>>>>>>>> array > >> >>>>>>>>> object and then chase it to the next DLL and chase that to the > >> >>>>>>>>> next array. I > >> >>>>>>>>> do get my two pointers together in memory though. I'm paying > for > >> >>>>>>>>> a card > >> >>>>>>>>> marking table as well, which I don't particularly need with > just > >> >>>>>>>>> two > >> >>>>>>>>> pointers, but we can shed that with the "SmallMutableArray#" > >> >>>>>>>>> machinery added > >> >>>>>>>>> back in 7.10, which is just the old array code a a new data > >> >>>>>>>>> type, which can > >> >>>>>>>>> speed things up a bit when you don't have very big arrays: > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> But what if I wanted my object itself to live in # and have > two > >> >>>>>>>>> mutable fields and be able to share the sme write barrier? > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> An ArrayArray# points directly to other unlifted array types. > >> >>>>>>>>> What > >> >>>>>>>>> if we have one # -> * wrapper on the outside to deal with the > >> >>>>>>>>> impedence > >> >>>>>>>>> mismatch between the imperative world and Haskell, and then > just > >> >>>>>>>>> let the > >> >>>>>>>>> ArrayArray#'s hold other arrayarrays. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL (MutableArrayArray# RealWorld) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> now I need to make up a new Nil, which I can just make be a > >> >>>>>>>>> special > >> >>>>>>>>> MutableArrayArray# I allocate on program startup. I can even > >> >>>>>>>>> abuse pattern > >> >>>>>>>>> synonyms. Alternately I can exploit the internals further to > >> >>>>>>>>> make this > >> >>>>>>>>> cheaper. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Then I can use the readMutableArrayArray# and > >> >>>>>>>>> writeMutableArrayArray# calls to directly access the preceding > >> >>>>>>>>> and next > >> >>>>>>>>> entry in the linked list. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> So now we have one DLL wrapper which just 'bootstraps me' > into a > >> >>>>>>>>> strict world, and everything there lives in #. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> next :: DLL -> IO DLL > >> >>>>>>>>> > >> >>>>>>>>> next (DLL m) = IO $ \s -> case readMutableArrayArray# s of > >> >>>>>>>>> > >> >>>>>>>>> (# s', n #) -> (# s', DLL n #) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> It turns out GHC is quite happy to optimize all of that code > to > >> >>>>>>>>> keep things unboxed. The 'DLL' wrappers get removed pretty > >> >>>>>>>>> easily when they > >> >>>>>>>>> are known strict and you chain operations of this sort! > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Cleaning it Up > >> >>>>>>>>> > >> >>>>>>>>> ------------------ > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Now I have one outermost indirection pointing to an array that > >> >>>>>>>>> points directly to other arrays. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I'm stuck paying for a card marking table per object, but I > can > >> >>>>>>>>> fix > >> >>>>>>>>> that by duplicating the code for MutableArrayArray# and using > a > >> >>>>>>>>> SmallMutableArray#. I can hack up primops that let me store a > >> >>>>>>>>> mixture of > >> >>>>>>>>> SmallMutableArray# fields and normal ones in the data > structure. > >> >>>>>>>>> Operationally, I can even do so by just unsafeCoercing the > >> >>>>>>>>> existing > >> >>>>>>>>> SmallMutableArray# primitives to change the kind of one of the > >> >>>>>>>>> arguments it > >> >>>>>>>>> takes. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> This is almost ideal, but not quite. I often have fields that > >> >>>>>>>>> would > >> >>>>>>>>> be best left unboxed. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLLInt = DLL !Int !(IORef DLL) !(IORef DLL) | Nil > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> was able to unpack the Int, but we lost that. We can currently > >> >>>>>>>>> at > >> >>>>>>>>> best point one of the entries of the SmallMutableArray# at a > >> >>>>>>>>> boxed or at a > >> >>>>>>>>> MutableByteArray# for all of our misc. data and shove the int > in > >> >>>>>>>>> question in > >> >>>>>>>>> there. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> e.g. if I were to implement a hash-array-mapped-trie I need to > >> >>>>>>>>> store masks and administrivia as I walk down the tree. Having > to > >> >>>>>>>>> go off to > >> >>>>>>>>> the side costs me the entire win from avoiding the first > pointer > >> >>>>>>>>> chase. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> But, if like Ryan suggested, we had a heap object we could > >> >>>>>>>>> construct that had n words with unsafe access and m pointers > to > >> >>>>>>>>> other heap > >> >>>>>>>>> objects, one that could put itself on the mutable list when > any > >> >>>>>>>>> of those > >> >>>>>>>>> pointers changed then I could shed this last factor of two in > >> >>>>>>>>> all > >> >>>>>>>>> circumstances. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Prototype > >> >>>>>>>>> > >> >>>>>>>>> ------------- > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Over the last few days I've put together a small prototype > >> >>>>>>>>> implementation with a few non-trivial imperative data > structures > >> >>>>>>>>> for things > >> >>>>>>>>> like Tarjan's link-cut trees, the list labeling problem and > >> >>>>>>>>> order-maintenance. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> https://github.com/ekmett/structs > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Notable bits: > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Data.Struct.Internal.LinkCut provides an implementation of > >> >>>>>>>>> link-cut > >> >>>>>>>>> trees in this style. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Data.Struct.Internal provides the rather horrifying guts that > >> >>>>>>>>> make > >> >>>>>>>>> it go fast. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Once compiled with -O or -O2, if you look at the core, almost > >> >>>>>>>>> all > >> >>>>>>>>> the references to the LinkCut or Object data constructor get > >> >>>>>>>>> optimized away, > >> >>>>>>>>> and we're left with beautiful strict code directly mutating > out > >> >>>>>>>>> underlying > >> >>>>>>>>> representation. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> At the very least I'll take this email and turn it into a > short > >> >>>>>>>>> article. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> -Edward > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> On Thu, Aug 27, 2015 at 9:00 AM, Simon Peyton Jones > >> >>>>>>>>> wrote: > >> >>>>>>>>> > >> >>>>>>>>> Just to say that I have no idea what is going on in this > thread. > >> >>>>>>>>> What is ArrayArray? What is the issue in general? Is there a > >> >>>>>>>>> ticket? Is > >> >>>>>>>>> there a wiki page? > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> If it?s important, an ab-initio wiki page + ticket would be a > >> >>>>>>>>> good > >> >>>>>>>>> thing. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Simon > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On > Behalf > >> >>>>>>>>> Of > >> >>>>>>>>> Edward Kmett > >> >>>>>>>>> Sent: 21 August 2015 05:25 > >> >>>>>>>>> To: Manuel M T Chakravarty > >> >>>>>>>>> Cc: Simon Marlow; ghc-devs > >> >>>>>>>>> Subject: Re: ArrayArrays > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> When (ab)using them for this purpose, SmallArrayArray's would > be > >> >>>>>>>>> very handy as well. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Consider right now if I have something like an > order-maintenance > >> >>>>>>>>> structure I have: > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data Upper s = Upper {-# UNPACK #-} !(MutableByteArray s) {-# > >> >>>>>>>>> UNPACK #-} !(MutVar s (Upper s)) {-# UNPACK #-} !(MutVar s > >> >>>>>>>>> (Upper s)) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data Lower s = Lower {-# UNPACK #-} !(MutVar s (Upper s)) {-# > >> >>>>>>>>> UNPACK #-} !(MutableByteArray s) {-# UNPACK #-} !(MutVar s > >> >>>>>>>>> (Lower s)) {-# > >> >>>>>>>>> UNPACK #-} !(MutVar s (Lower s)) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> The former contains, logically, a mutable integer and two > >> >>>>>>>>> pointers, > >> >>>>>>>>> one for forward and one for backwards. The latter is basically > >> >>>>>>>>> the same > >> >>>>>>>>> thing with a mutable reference up pointing at the structure > >> >>>>>>>>> above. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> On the heap this is an object that points to a structure for > the > >> >>>>>>>>> bytearray, and points to another structure for each mutvar > which > >> >>>>>>>>> each point > >> >>>>>>>>> to the other 'Upper' structure. So there is a level of > >> >>>>>>>>> indirection smeared > >> >>>>>>>>> over everything. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> So this is a pair of doubly linked lists with an upward link > >> >>>>>>>>> from > >> >>>>>>>>> the structure below to the structure above. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Converted into ArrayArray#s I'd get > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data Upper s = Upper (MutableArrayArray# s) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> w/ the first slot being a pointer to a MutableByteArray#, and > >> >>>>>>>>> the > >> >>>>>>>>> next 2 slots pointing to the previous and next previous > objects, > >> >>>>>>>>> represented > >> >>>>>>>>> just as their MutableArrayArray#s. I can use > >> >>>>>>>>> sameMutableArrayArray# on these > >> >>>>>>>>> for object identity, which lets me check for the ends of the > >> >>>>>>>>> lists by tying > >> >>>>>>>>> things back on themselves. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> and below that > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data Lower s = Lower (MutableArrayArray# s) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> is similar, with an extra MutableArrayArray slot pointing up > to > >> >>>>>>>>> an > >> >>>>>>>>> upper structure. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I can then write a handful of combinators for getting out the > >> >>>>>>>>> slots > >> >>>>>>>>> in question, while it has gained a level of indirection > between > >> >>>>>>>>> the wrapper > >> >>>>>>>>> to put it in * and the MutableArrayArray# s in #, that one can > >> >>>>>>>>> be basically > >> >>>>>>>>> erased by ghc. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Unlike before I don't have several separate objects on the > heap > >> >>>>>>>>> for > >> >>>>>>>>> each thing. I only have 2 now. The MutableArrayArray# for the > >> >>>>>>>>> object itself, > >> >>>>>>>>> and the MutableByteArray# that it references to carry around > the > >> >>>>>>>>> mutable > >> >>>>>>>>> int. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> The only pain points are > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> 1.) the aforementioned limitation that currently prevents me > >> >>>>>>>>> from > >> >>>>>>>>> stuffing normal boxed data through a SmallArray or Array into > an > >> >>>>>>>>> ArrayArray > >> >>>>>>>>> leaving me in a little ghetto disconnected from the rest of > >> >>>>>>>>> Haskell, > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> and > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> 2.) the lack of SmallArrayArray's, which could let us avoid > the > >> >>>>>>>>> card marking overhead. These objects are all small, 3-4 > pointers > >> >>>>>>>>> wide. Card > >> >>>>>>>>> marking doesn't help. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Alternately I could just try to do really evil things and > >> >>>>>>>>> convert > >> >>>>>>>>> the whole mess to SmallArrays and then figure out how to > >> >>>>>>>>> unsafeCoerce my way > >> >>>>>>>>> to glory, stuffing the #'d references to the other arrays > >> >>>>>>>>> directly into the > >> >>>>>>>>> SmallArray as slots, removing the limitation we see here by > >> >>>>>>>>> aping the > >> >>>>>>>>> MutableArrayArray# s API, but that gets really really > dangerous! > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I'm pretty much willing to sacrifice almost anything on the > >> >>>>>>>>> altar > >> >>>>>>>>> of speed here, but I'd like to be able to let the GC move them > >> >>>>>>>>> and collect > >> >>>>>>>>> them which rules out simpler Ptr and Addr based solutions. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> -Edward > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> On Thu, Aug 20, 2015 at 9:01 PM, Manuel M T Chakravarty > >> >>>>>>>>> wrote: > >> >>>>>>>>> > >> >>>>>>>>> That?s an interesting idea. > >> >>>>>>>>> > >> >>>>>>>>> Manuel > >> >>>>>>>>> > >> >>>>>>>>> > Edward Kmett : > >> >>>>>>>>> > >> >>>>>>>>> > > >> >>>>>>>>> > Would it be possible to add unsafe primops to add Array# and > >> >>>>>>>>> > SmallArray# entries to an ArrayArray#? The fact that the > >> >>>>>>>>> > ArrayArray# entries > >> >>>>>>>>> > are all directly unlifted avoiding a level of indirection > for > >> >>>>>>>>> > the containing > >> >>>>>>>>> > structure is amazing, but I can only currently use it if my > >> >>>>>>>>> > leaf level data > >> >>>>>>>>> > can be 100% unboxed and distributed among ByteArray#s. It'd > be > >> >>>>>>>>> > nice to be > >> >>>>>>>>> > able to have the ability to put SmallArray# a stuff down at > >> >>>>>>>>> > the leaves to > >> >>>>>>>>> > hold lifted contents. > >> >>>>>>>>> > > >> >>>>>>>>> > I accept fully that if I name the wrong type when I go to > >> >>>>>>>>> > access > >> >>>>>>>>> > one of the fields it'll lie to me, but I suppose it'd do > that > >> >>>>>>>>> > if i tried to > >> >>>>>>>>> > use one of the members that held a nested ArrayArray# as a > >> >>>>>>>>> > ByteArray# > >> >>>>>>>>> > anyways, so it isn't like there is a safety story preventing > >> >>>>>>>>> > this. > >> >>>>>>>>> > > >> >>>>>>>>> > I've been hunting for ways to try to kill the indirection > >> >>>>>>>>> > problems I get with Haskell and mutable structures, and I > >> >>>>>>>>> > could shoehorn a > >> >>>>>>>>> > number of them into ArrayArrays if this worked. > >> >>>>>>>>> > > >> >>>>>>>>> > Right now I'm stuck paying for 2 or 3 levels of unnecessary > >> >>>>>>>>> > indirection compared to c/java and this could reduce that > pain > >> >>>>>>>>> > to just 1 > >> >>>>>>>>> > level of unnecessary indirection. > >> >>>>>>>>> > > >> >>>>>>>>> > -Edward > >> >>>>>>>>> > >> >>>>>>>>> > _______________________________________________ > >> >>>>>>>>> > ghc-devs mailing list > >> >>>>>>>>> > ghc-devs at haskell.org > >> >>>>>>>>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> _______________________________________________ > >> >>>>>>>>> ghc-devs mailing list > >> >>>>>>>>> ghc-devs at haskell.org > >> >>>>>>>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > >> >>>>>>> > >> >>>>>>> > >> >>>>> > >> >>> > >> >> > >> > > >> > > >> > _______________________________________________ > >> > ghc-devs mailing list > >> > ghc-devs at haskell.org > >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > >> > > > > > > > > > > > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dan.doel at gmail.com Mon Sep 7 20:19:59 2015 From: dan.doel at gmail.com (Dan Doel) Date: Mon, 7 Sep 2015 16:19:59 -0400 Subject: Unlifted data types In-Reply-To: <6707b31c94d44af89ba2a90580ac46ce@DB4PR30MB030.064d.mgd.msft.net> References: <1441353701-sup-9422@sabre> <6707b31c94d44af89ba2a90580ac46ce@DB4PR30MB030.064d.mgd.msft.net> Message-ID: On Mon, Sep 7, 2015 at 4:00 PM, Simon Peyton Jones wrote: > (2) Second, we cannot expect levity polymorphism. Consider > map f (x:xs) = f x : map f xs > Is the (f x) a thunk or is it evaluated strictly? Unless you are going to clone the code for map (which levity polymorphism is there to avoid), we can't answer "it depends on the type of (f x)". So, no, I think levity polymorphism is out. > > So I vote against splitting # into two: plain will do just fine. I don't understand how that last bit follows from the previous stuff (or, I don't understand the sentence). Splitting # into two kinds is useful even if functions can't be levity polymorphic. # contains a bunch of types that aren't represented uniformly. Int# might be 32 bits while Double# is 64, etc. But Unlifted would contain only types that are uniformly represented as pointers, so you could write functions that are polymorphic over types of kind Unlifted. This is not true for Unboxed/# (unless we implement C++ style polymorphism-as-code-generation). ---- Also, with regard to the previous mail, it's not true that `suspend` has to be a special form. All expressions with types of kind * are 'special forms' in the necessary sense. -- Dan From mail at joachim-breitner.de Mon Sep 7 20:21:14 2015 From: mail at joachim-breitner.de (Joachim Breitner) Date: Mon, 07 Sep 2015 22:21:14 +0200 Subject: AnonymousSums data con syntax In-Reply-To: <9eb2c9041f6142ce947a4b323c0b2bff@DB4PR30MB030.064d.mgd.msft.net> References: <9eb2c9041f6142ce947a4b323c0b2bff@DB4PR30MB030.064d.mgd.msft.net> Message-ID: <1441657274.28403.7.camel@joachim-breitner.de> Hi, Am Montag, den 07.09.2015, 19:25 +0000 schrieb Simon Peyton Jones: > > Are we okay with stealing some operator sections for this? E.G. (x > > > > ). I think the boxed sums larger than 2 choices are all technically overlapping with sections. > > I hadn't thought of that. I suppose that in distfix notation we > could require spaces > (x | |) > since vertical bar by itself isn't an operator. But then (_||) x > might feel more compact. > > Also a section (x ||) isn't valid in a pattern, so we would not need > to require spaces there. > > But my gut feel is: yes, with AnonymousSums we should just steal the > syntax. It won't hurt existing code (since it won't use > AnonymousSums), and if you *are* using AnonymousSums then the distfix > notation is probably more valuable than the sections for an operator > you probably aren't using. I wonder if this syntax for constructors is really that great. Yes, you there is similarly with the type constructor (which is nice), but for the data constructor, do we really want an unary encoding and have our users count bars? I believe the user (and also us, having to read core) would be better served by some syntax that involves plain numbers. Given that of is already a keyword, how about something involving "3 of 4"? For example (Put# True in 3 of 5) :: (# a | b | Bool | d | e #) and case sum of (Put# x in 1 of 3) -> ... (Put# x in 2 of 3) -> ... (Put# x in 3 of 3) -> ... (If "as" were a keyword, (Put# x as 2 of 3) would sound even better.) I don?t find this particular choice very great, but something with numbers rather than ASCII art seems to make more sense here. Is there something even better? Greetings, Joachim -- Joachim ?nomeata? Breitner mail at joachim-breitner.de ? http://www.joachim-breitner.de/ Jabber: nomeata at joachim-breitner.de ? GPG-Key: 0xF0FBF51F Debian Developer: nomeata at debian.org -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From mike at izbicki.me Mon Sep 7 20:26:53 2015 From: mike at izbicki.me (Mike Izbicki) Date: Mon, 7 Sep 2015 13:26:53 -0700 Subject: question about GHC API on GHC plugin In-Reply-To: References: <1439014742-sup-2126@sabre> Message-ID: I have another question :) This one relates to Andrew Farmer's answer a while back on how to build dictionaries given a Concrete type. Everything I have works when I use my own numeric hierarchy, but when I use the Prelude's numeric hierarchy, GHC can't find the `Num Float` instance (or any other builtin instance). I created the following function (based on HERMIT's buildDictionary function) to build my dictionaries (for GHC 7.10.1): -- | Given a function name and concrete type, get the needed dictionary. getDictConcrete :: ModGuts -> String -> Type -> CoreM (Maybe (Expr CoreBndr)) getDictConcrete guts opstr t = trace ("getDictConcrete "++opstr) $ do hscenv <- getHscEnv dflags <- getDynFlags eps <- liftIO $ hscEPS hscenv let (opname,ParentIs classname) = getNameParent guts opstr classType = mkTyConTy $ case lookupNameEnv (eps_PTE eps) classname of Just (ATyCon t) -> t Just (AnId _) -> error "loopupNameEnv AnId" Just (AConLike _) -> error "loopupNameEnv AConLike" Just (ACoAxiom _) -> error "loopupNameEnv ACoAxiom" Nothing -> error "getNameParent gutsEnv Nothing" dictType = mkAppTy classType t dictVar = mkGlobalVar VanillaId (mkSystemName (mkUnique 'z' 1337) (mkVarOcc $ "magicDictionaryName")) dictType vanillaIdInfo bnds <- runTcM guts $ do loc <- getCtLoc $ GivenOrigin UnkSkol let nonC = mkNonCanonical $ CtWanted { ctev_pred = dictType , ctev_evar = dictVar , ctev_loc = loc } wCs = mkSimpleWC [nonC] (x, evBinds) <- solveWantedsTcM wCs bnds <- initDsTc $ dsEvBinds evBinds liftIO $ do putStrLn $ "dictType="++showSDoc dflags (ppr dictType) putStrLn $ "dictVar="++showSDoc dflags (ppr dictVar) putStrLn $ "nonC="++showSDoc dflags (ppr nonC) putStrLn $ "wCs="++showSDoc dflags (ppr wCs) putStrLn $ "bnds="++showSDoc dflags (ppr bnds) putStrLn $ "x="++showSDoc dflags (ppr x) return bnds case bnds of [NonRec _ dict] -> return $ Just dict otherwise -> return Nothing When I use my own numeric class hierarchy, this works great! But when I use the Prelude numeric hierarchy, this doesn't work for some reason. In particular, if I pass `+` as the operation I want a dictionary for on the type `Float`, then the function returns `Nothing` with the following output: getDictConcrete + dictType=Num Float dictVar=magicDictionaryName_zlz nonC=[W] magicDictionaryName_zlz :: Num Float (CNonCanonical) wCs=WC {wc_simple = [W] magicDictionaryName_zlz :: Num Float (CNonCanonical)} bnds=[] x=WC {wc_simple = [W] magicDictionaryName_zlz :: Num Float (CNonCanonical)} If I change the `solveWantedTcMs` function to `simplifyInteractive`, then GHC panics with the following message: Top level: No instance for (GHC.Num.Num GHC.Types.Float) arising from UnkSkol Why doesn't the TcM monad know about the `Num Float` instance? On Fri, Sep 4, 2015 at 9:18 PM, ?mer Sinan A?acan wrote: > Typo: "You're parsing your code" I mean "You're passing your code" > > 2015-09-05 0:16 GMT-04:00 ?mer Sinan A?acan : >> Hi Mike, >> >> I'll try to hack an example for you some time tomorrow(I'm returning from ICFP >> and have some long flights ahead of me). >> >> But in the meantime, here's a working Core code, generated by GHC: >> >> f_rjH :: forall a_alz. Ord a_alz => a_alz -> Bool >> f_rjH = >> \ (@ a_aCH) ($dOrd_aCI :: Ord a_aCH) (eta_B1 :: a_aCH) -> >> == @ a_aCH (GHC.Classes.$p1Ord @ a_aCH $dOrd_aCI) eta_B1 eta_B1 >> >> You can clearly see here how Eq dictionary is selected from Ord >> dicitonary($dOrd_aCI in the example), it's just an application of selector to >> type and dictionary, that's all. >> >> This is generated from this code: >> >> {-# NOINLINE f #-} >> f :: Ord a => a -> Bool >> f x = x == x >> >> Compile it with this: >> >> ghc --make -fforce-recomp -O0 -ddump-simpl -ddump-to-file Main.hs >> -dsuppress-idinfo >> >>> Can anyone help me figure this out? Is there any chance this is a bug in how >>> GHC parses Core? >> >> This seems unlikely, because GHC doesn't have a Core parser and there's no Core >> parsing going on here, you're parsing your Code in the form of AST(CoreExpr, >> CoreProgram etc. defined in CoreSyn.hs). Did you mean something else and am I >> misunderstanding? >> >> 2015-09-04 19:39 GMT-04:00 Mike Izbicki : >>> I'm still having trouble creating Core code that can extract >>> superclass dictionaries from a given dictionary. I suspect the >>> problem is that I don't actually understand what the Core code to do >>> this is supposed to look like. I keep getting the errors mentioned >>> above when I try what I think should work. >>> >>> Can anyone help me figure this out? Is there any chance this is a bug >>> in how GHC parses Core? >>> >>> On Tue, Aug 25, 2015 at 9:24 PM, Mike Izbicki wrote: >>>> The purpose of the plugin is to automatically improve the numerical >>>> stability of Haskell code. It is supposed to identify numeric >>>> expressions, then use Herbie (https://github.com/uwplse/herbie) to >>>> generate a numerically stable version, then rewrite the numerically >>>> stable version back into the code. The first two steps were really >>>> easy. It's the last step of inserting back into the code that I'm >>>> having tons of trouble with. Core is a lot more complicated than I >>>> thought :) >>>> >>>> I'm not sure what you mean by the CoreExpr representation? Here's the >>>> output of the pretty printer you gave: >>>> App (App (App (App (Var Id{+,r2T,ForAllTy TyVar{a} (FunTy (TyConApp >>>> Num [TyVarTy TyVar{a}]) (FunTy (TyVarTy TyVar{a}) (FunTy (TyVarTy >>>> TyVar{a}) (TyVarTy TyVar{a})))),VanillaId,Info{0,SpecInfo [] >>>> ,NoUnfolding,MayHaveCafRefs,NoOneShotInfo,InlinePragma >>>> {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = >>>> Nothing, inl_act = AlwaysActive, inl_rule = >>>> FunLike},NoOccInfo,StrictSig (DmdType [] (Dunno NoCPR)),JD >>>> {strd = Lazy, absd = Use Many Used},0}}) (Type (TyVarTy TyVar{a}))) >>>> (App (Var Id{$p1Fractional,rh3,ForAllTy TyVar{a} (FunTy (TyConApp >>>> Fractional [TyVarTy TyVar{a}]) (TyConApp Num [TyVarTy >>>> TyVar{a}])),ClassOpId ,Info{1,SpecInfo [BuiltinRule {ru_name = >>>> "Class op $p1Fractional", ru_fn = $p1Fractional, ru_nargs = 2, ru_try >>>> = }] ,NoUnfolding,NoCafRefs,NoOneShotInfo,InlinePragma >>>> {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = >>>> Nothing, inl_act = AlwaysActive, inl_rule = >>>> FunLike},NoOccInfo,StrictSig (DmdType [JD {strd = Str (SProd >>>> [Str HeadStr,Lazy,Lazy,Lazy]), absd = Use Many (UProd [Use Many >>>> Used,Abs,Abs,Abs])}] (Dunno NoCPR)),JD {strd = Lazy, absd = Use Many >>>> Used},0}}) (App (Var Id{$p1Floating,rh2,ForAllTy TyVar{a} (FunTy >>>> (TyConApp Floating [TyVarTy TyVar{a}]) (TyConApp Fractional [TyVarTy >>>> TyVar{a}])),ClassOpId ,Info{1,SpecInfo [BuiltinRule {ru_name = >>>> "Class op $p1Floating", ru_fn = $p1Floating, ru_nargs = 2, ru_try = >>>> }] ,NoUnfolding,NoCafRefs,NoOneShotInfo,InlinePragma >>>> {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = >>>> Nothing, inl_act = AlwaysActive, inl_rule = >>>> FunLike},NoOccInfo,StrictSig (DmdType [JD {strd = Str (SProd >>>> [Str HeadStr,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy,Lazy]), >>>> absd = Use Many (UProd [Use Many >>>> Used,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs,Abs])}] >>>> (Dunno NoCPR)),JD {strd = Lazy, absd = Use Many Used},0}}) (Var >>>> Id{$dFloating,aBM,TyConApp Floating [TyVarTy >>>> TyVar{a}],VanillaId,Info{0,SpecInfo [] >>>> ,NoUnfolding,MayHaveCafRefs,NoOneShotInfo,InlinePragma >>>> {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = >>>> Nothing, inl_act = AlwaysActive, inl_rule = >>>> FunLike},NoOccInfo,StrictSig (DmdType [] (Dunno NoCPR)),JD >>>> {strd = Lazy, absd = Use Many Used},0}})))) (Var Id{x1,anU,TyVarTy >>>> TyVar{a},VanillaId,Info{0,SpecInfo [] >>>> ,NoUnfolding,MayHaveCafRefs,NoOneShotInfo,InlinePragma >>>> {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = >>>> Nothing, inl_act = AlwaysActive, inl_rule = >>>> FunLike},NoOccInfo,StrictSig (DmdType [] (Dunno NoCPR)),JD >>>> {strd = Lazy, absd = Use Many Used},0}})) (Var Id{x1,anU,TyVarTy >>>> TyVar{a},VanillaId,Info{0,SpecInfo [] >>>> ,NoUnfolding,MayHaveCafRefs,NoOneShotInfo,InlinePragma >>>> {inl_src = "{-# INLINE", inl_inline = EmptyInlineSpec, inl_sat = >>>> Nothing, inl_act = AlwaysActive, inl_rule = >>>> FunLike},NoOccInfo,StrictSig (DmdType [] (Dunno NoCPR)),JD >>>> {strd = Lazy, absd = Use Many Used},0}}) >>>> >>>> You can find my pretty printer (and all the other code for the plugin) >>>> at: https://github.com/mikeizbicki/herbie-haskell/blob/master/src/Herbie.hs#L627 >>>> >>>> The function getDictMap >>>> (https://github.com/mikeizbicki/herbie-haskell/blob/master/src/Herbie.hs#L171) >>>> is where I'm constructing the dictionaries that are getting inserted >>>> back into the Core. >>>> >>>> On Tue, Aug 25, 2015 at 7:17 PM, ?mer Sinan A?acan wrote: >>>>> It seems like in your App syntax you're having a non-function in function >>>>> position. You can see this by looking at what failing function >>>>> (splitFunTy_maybe) is doing: >>>>> >>>>> splitFunTy_maybe :: Type -> Maybe (Type, Type) >>>>> -- ^ Attempts to extract the argument and result types from a type >>>>> ... (definition is not important) ... >>>>> >>>>> Then it's used like this at the error site: >>>>> >>>>> (arg_ty, res_ty) = expectJust "cpeBody:collect_args" $ >>>>> splitFunTy_maybe fun_ty >>>>> >>>>> In your case this function is returning Nothing and then exceptJust is >>>>> signalling the panic. >>>>> >>>>> Your code looked correct to me, I don't see any problems with that. Maybe you're >>>>> using something wrong as selectors. Could you paste CoreExpr representation of >>>>> your program? >>>>> >>>>> It may also be the case that the panic is caused by something else, maybe your >>>>> syntax is invalidating some assumptions/invariants in GHC but it's not >>>>> immediately checked etc. Working at the Core level is frustrating at times. >>>>> >>>>> Can I ask what kind of plugin are you working on? >>>>> >>>>> (Btw, how did you generate this representation of AST? Did you write it >>>>> manually? If you have a pretty-printer, would you mind sharing it?) >>>>> >>>>> 2015-08-25 18:50 GMT-04:00 Mike Izbicki : >>>>>> Thanks ?mer! >>>>>> >>>>>> I'm able to get dictionaries for the superclasses of a class now, but >>>>>> I get an error whenever I try to get a dictionary for a >>>>>> super-superclass. Here's the Haskell expression I'm working with: >>>>>> >>>>>> test1 :: Floating a => a -> a >>>>>> test1 x1 = x1+x1 >>>>>> >>>>>> The original core is: >>>>>> >>>>>> + @ a $dNum_aJu x1 x1 >>>>>> >>>>>> But my plugin is replacing it with the core: >>>>>> >>>>>> + @ a ($p1Fractional ($p1Floating $dFloating_aJq)) x1 x1 >>>>>> >>>>>> The only difference is the way I'm getting the Num dictionary. The >>>>>> corresponding AST (annotated with variable names and types) is: >>>>>> >>>>>> App >>>>>> (App >>>>>> (App >>>>>> (App >>>>>> (Var +::forall a. Num a => a -> a -> a) >>>>>> (Type a) >>>>>> ) >>>>>> (App >>>>>> (Var $p1Fractional::forall a. Fractional a => Num a) >>>>>> (App >>>>>> (Var $p1Floating::forall a. Floating a => Fractional a) >>>>>> (Var $dFloating_aJq::Floating a) >>>>>> ) >>>>>> ) >>>>>> ) >>>>>> (Var x1::'a') >>>>>> ) >>>>>> (Var x1::'a') >>>>>> >>>>>> When I insert, GHC gives the following error: >>>>>> >>>>>> ghc: panic! (the 'impossible' happened) >>>>>> (GHC version 7.10.1 for x86_64-unknown-linux): >>>>>> expectJust cpeBody:collect_args >>>>>> >>>>>> What am I doing wrong with extracting these super-superclass >>>>>> dictionaries? I've looked up the code for cpeBody in GHC, but I can't >>>>>> figure out what it's trying to do, so I'm not sure why it's failing on >>>>>> my core. >>>>>> >>>>>> On Mon, Aug 24, 2015 at 7:10 PM, ?mer Sinan A?acan wrote: >>>>>>> Mike, here's a piece of code that may be helpful to you: >>>>>>> >>>>>>> https://github.com/osa1/sc-plugin/blob/master/src/Supercompilation/Show.hs >>>>>>> >>>>>>> Copy this module to your plugin, it doesn't have any dependencies other than >>>>>>> ghc itself. When your plugin is initialized, update `dynFlags_ref` with your >>>>>>> DynFlags as first thing to do. Then use Show instance to print AST directly. >>>>>>> >>>>>>> Horrible hack, but very useful for learning purposes. In fact, I don't know how >>>>>>> else we can learn what Core is generated for a given code, and reverse-engineer >>>>>>> to figure out details. >>>>>>> >>>>>>> Hope it helps. >>>>>>> >>>>>>> 2015-08-24 21:59 GMT-04:00 ?mer Sinan A?acan : >>>>>>>>> Lets say I'm running the plugin on a function with signature `Floating a => a >>>>>>>>> -> a`, then the plugin has access to the `Floating` dictionary for the type. >>>>>>>>> But if I want to add two numbers together, I need the `Num` dictionary. I >>>>>>>>> know I should have access to `Num` since it's a superclass of `Floating`. >>>>>>>>> How can I get access to these superclass dictionaries? >>>>>>>> >>>>>>>> I don't have a working code for this but this should get you started: >>>>>>>> >>>>>>>> let ord_dictionary :: Id = ... >>>>>>>> ord_class :: Class = ... >>>>>>>> in >>>>>>>> mkApps (Var (head (classSCSels ord_class))) [Var ord_dictionary] >>>>>>>> >>>>>>>> I don't know how to get Class for Ord. I do `head` here because in the case of >>>>>>>> Ord we only have one superclass so `classSCSels` should have one Id. Then I >>>>>>>> apply ord_dictionary to this selector and it should return dictionary for Eq. >>>>>>>> >>>>>>>> I assumed you already have ord_dictionary, it should be passed to your function >>>>>>>> already if you had `(Ord a) => ` in your function. >>>>>>>> >>>>>>>> >>>>>>>> Now I realized you asked for getting Num from Floating. I think you should >>>>>>>> follow a similar path except you need two applications, first to get Fractional >>>>>>>> from Floating and second to get Num from Fractional: >>>>>>>> >>>>>>>> mkApps (Var (head (classSCSels fractional_class))) >>>>>>>> [mkApps (Var (head (classSCSels floating_class))) >>>>>>>> [Var floating_dictionary]] >>>>>>>> >>>>>>>> Return value should be a Num dictionary. >>> _______________________________________________ >>> ghc-devs mailing list >>> ghc-devs at haskell.org >>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From rrnewton at gmail.com Mon Sep 7 20:27:43 2015 From: rrnewton at gmail.com (Ryan Newton) Date: Mon, 7 Sep 2015 16:27:43 -0400 Subject: ArrayArrays In-Reply-To: References: <4DACFC45-0E7E-4B3F-8435-5365EC3F7749@cse.unsw.edu.au> <65158505c7be41afad85374d246b7350@DB4PR30MB030.064d.mgd.msft.net> <2FCB6298-A4FF-4F7B-8BF8-4880BB3154AB@gmail.com> <325b043066bb48a79f254b75ba9753ee@DB4PR30MB030.064d.mgd.msft.net> Message-ID: Ah, incidentally that introduces an interesting difference between atomicModify and CAS. CAS should be able to work on mutable locations in that subset of # that are represented by a gcptr, whereas Edward pointed out that atomicModify cannot. (Indeed, to use lock-free algorithms with these new unboxed mutable structures we'll need CAS on the slots.) On Mon, Sep 7, 2015 at 4:16 PM, Edward Kmett wrote: > I had a brief discussion with Richard during the Haskell Symposium about > how we might be able to let parametricity help a bit in reducing the space > of necessarily primops to a slightly more manageable level. > > Notably, it'd be interesting to explore the ability to allow parametricity > over the portion of # that is just a gcptr. > > We could do this if the levity polymorphism machinery was tweaked a bit. > You could envision the ability to abstract over things in both * and the > subset of # that are represented by a gcptr, then modifying the existing > array primitives to be parametric in that choice of levity for their > argument so long as it was of a "heap object" levity. > > This could make the menagerie of ways to pack > {Small}{Mutable}Array{Array}# references into a > {Small}{Mutable}Array{Array}#' actually typecheck soundly, reducing the > need for folks to descend into the use of the more evil structure > primitives we're talking about, and letting us keep a few more principles > around us. > > Then in the cases like `atomicModifyMutVar#` where it needs to actually be > in * rather than just a gcptr, due to the constructed field selectors it > introduces on the heap then we could keep the existing less polymorphic > type. > > -Edward > > On Mon, Sep 7, 2015 at 9:59 AM, Simon Peyton Jones > wrote: > >> It was fun to meet and discuss this. >> >> >> >> Did someone volunteer to write a wiki page that describes the proposed >> design? And, I earnestly hope, also describes the menagerie of currently >> available array types and primops so that users can have some chance of >> picking the right one?! >> >> >> >> Thanks >> >> >> >> Simon >> >> >> >> *From:* ghc-devs [mailto:ghc-devs-bounces at haskell.org] *On Behalf Of *Ryan >> Newton >> *Sent:* 31 August 2015 23:11 >> *To:* Edward Kmett; Johan Tibell >> *Cc:* Simon Marlow; Manuel M T Chakravarty; Chao-Hong Chen; ghc-devs; >> Ryan Scott; Ryan Yates >> *Subject:* Re: ArrayArrays >> >> >> >> Dear Edward, Ryan Yates, and other interested parties -- >> >> >> >> So when should we meet up about this? >> >> >> >> May I propose the Tues afternoon break for everyone at ICFP who is >> interested in this topic? We can meet out in the coffee area and >> congregate around Edward Kmett, who is tall and should be easy to find ;-). >> >> >> >> I think Ryan is going to show us how to use his new primops for combined >> array + other fields in one heap object? >> >> >> >> On Sat, Aug 29, 2015 at 9:24 PM Edward Kmett wrote: >> >> Without a custom primitive it doesn't help much there, you have to store >> the indirection to the mask. >> >> >> >> With a custom primitive it should cut the on heap root-to-leaf path of >> everything in the HAMT in half. A shorter HashMap was actually one of the >> motivating factors for me doing this. It is rather astoundingly difficult >> to beat the performance of HashMap, so I had to start cheating pretty >> badly. ;) >> >> >> >> -Edward >> >> >> >> On Sat, Aug 29, 2015 at 5:45 PM, Johan Tibell >> wrote: >> >> I'd also be interested to chat at ICFP to see if I can use this for my >> HAMT implementation. >> >> >> >> On Sat, Aug 29, 2015 at 3:07 PM, Edward Kmett wrote: >> >> Sounds good to me. Right now I'm just hacking up composable accessors for >> "typed slots" in a fairly lens-like fashion, and treating the set of slots >> I define and the 'new' function I build for the data type as its API, and >> build atop that. This could eventually graduate to template-haskell, but >> I'm not entirely satisfied with the solution I have. I currently >> distinguish between what I'm calling "slots" (things that point directly to >> another SmallMutableArrayArray# sans wrapper) and "fields" which point >> directly to the usual Haskell data types because unifying the two notions >> meant that I couldn't lift some coercions out "far enough" to make them >> vanish. >> >> >> >> I'll be happy to run through my current working set of issues in person >> and -- as things get nailed down further -- in a longer lived medium than >> in personal conversations. ;) >> >> >> >> -Edward >> >> >> >> On Sat, Aug 29, 2015 at 7:59 AM, Ryan Newton wrote: >> >> I'd also love to meet up at ICFP and discuss this. I think the array >> primops plus a TH layer that lets (ab)use them many times without too much >> marginal cost sounds great. And I'd like to learn how we could be either >> early users of, or help with, this infrastructure. >> >> >> >> CC'ing in Ryan Scot and Omer Agacan who may also be interested in >> dropping in on such discussions @ICFP, and Chao-Hong Chen, a Ph.D. student >> who is currently working on concurrent data structures in Haskell, but will >> not be at ICFP. >> >> >> >> >> >> On Fri, Aug 28, 2015 at 7:47 PM, Ryan Yates wrote: >> >> I completely agree. I would love to spend some time during ICFP and >> friends talking about what it could look like. My small array for STM >> changes for the RTS can be seen here [1]. It is on a branch somewhere >> between 7.8 and 7.10 and includes irrelevant STM bits and some >> confusing naming choices (sorry), but should cover all the details >> needed to implement it for a non-STM context. The biggest surprise >> for me was following small array too closely and having a word/byte >> offset miss-match [2]. >> >> [1]: >> https://github.com/fryguybob/ghc/compare/ghc-htm-bloom...fryguybob:ghc-htm-mut >> [2]: https://ghc.haskell.org/trac/ghc/ticket/10413 >> >> Ryan >> >> >> On Fri, Aug 28, 2015 at 10:09 PM, Edward Kmett wrote: >> > I'd love to have that last 10%, but its a lot of work to get there and >> more >> > importantly I don't know quite what it should look like. >> > >> > On the other hand, I do have a pretty good idea of how the primitives >> above >> > could be banged out and tested in a long evening, well in time for >> 7.12. And >> > as noted earlier, those remain useful even if a nicer typed version >> with an >> > extra level of indirection to the sizes is built up after. >> > >> > The rest sounds like a good graduate student project for someone who has >> > graduate students lying around. Maybe somebody at Indiana University >> who has >> > an interest in type theory and parallelism can find us one. =) >> > >> > -Edward >> > >> > On Fri, Aug 28, 2015 at 8:48 PM, Ryan Yates >> wrote: >> >> >> >> I think from my perspective, the motivation for getting the type >> >> checker involved is primarily bringing this to the level where users >> >> could be expected to build these structures. it is reasonable to >> >> think that there are people who want to use STM (a context with >> >> mutation already) to implement a straight forward data structure that >> >> avoids extra indirection penalty. There should be some places where >> >> knowing that things are field accesses rather then array indexing >> >> could be helpful, but I think GHC is good right now about handling >> >> constant offsets. In my code I don't do any bounds checking as I know >> >> I will only be accessing my arrays with constant indexes. I make >> >> wrappers for each field access and leave all the unsafe stuff in >> >> there. When things go wrong though, the compiler is no help. Maybe >> >> template Haskell that generates the appropriate wrappers is the right >> >> direction to go. >> >> There is another benefit for me when working with these as arrays in >> >> that it is quite simple and direct (given the hoops already jumped >> >> through) to play with alignment. I can ensure two pointers are never >> >> on the same cache-line by just spacing things out in the array. >> >> >> >> On Fri, Aug 28, 2015 at 7:33 PM, Edward Kmett >> wrote: >> >> > They just segfault at this level. ;) >> >> > >> >> > Sent from my iPhone >> >> > >> >> > On Aug 28, 2015, at 7:25 PM, Ryan Newton wrote: >> >> > >> >> > You presumably also save a bounds check on reads by hard-coding the >> >> > sizes? >> >> > >> >> > On Fri, Aug 28, 2015 at 3:39 PM, Edward Kmett >> wrote: >> >> >> >> >> >> Also there are 4 different "things" here, basically depending on two >> >> >> independent questions: >> >> >> >> >> >> a.) if you want to shove the sizes into the info table, and >> >> >> b.) if you want cardmarking. >> >> >> >> >> >> Versions with/without cardmarking for different sizes can be done >> >> >> pretty >> >> >> easily, but as noted, the infotable variants are pretty invasive. >> >> >> >> >> >> -Edward >> >> >> >> >> >> On Fri, Aug 28, 2015 at 6:36 PM, Edward Kmett >> wrote: >> >> >>> >> >> >>> Well, on the plus side you'd save 16 bytes per object, which adds >> up >> >> >>> if >> >> >>> they were small enough and there are enough of them. You get a bit >> >> >>> better >> >> >>> locality of reference in terms of what fits in the first cache >> line of >> >> >>> them. >> >> >>> >> >> >>> -Edward >> >> >>> >> >> >>> On Fri, Aug 28, 2015 at 6:14 PM, Ryan Newton >> >> >>> wrote: >> >> >>>> >> >> >>>> Yes. And for the short term I can imagine places we will settle >> with >> >> >>>> arrays even if it means tracking lengths unnecessarily and >> >> >>>> unsafeCoercing >> >> >>>> pointers whose types don't actually match their siblings. >> >> >>>> >> >> >>>> Is there anything to recommend the hacks mentioned for fixed sized >> >> >>>> array >> >> >>>> objects *other* than using them to fake structs? (Much to >> >> >>>> derecommend, as >> >> >>>> you mentioned!) >> >> >>>> >> >> >>>> On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett >> >> >>>> wrote: >> >> >>>>> >> >> >>>>> I think both are useful, but the one you suggest requires a lot >> more >> >> >>>>> plumbing and doesn't subsume all of the usecases of the other. >> >> >>>>> >> >> >>>>> -Edward >> >> >>>>> >> >> >>>>> On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton > > >> >> >>>>> wrote: >> >> >>>>>> >> >> >>>>>> So that primitive is an array like thing (Same pointed type, >> >> >>>>>> unbounded >> >> >>>>>> length) with extra payload. >> >> >>>>>> >> >> >>>>>> I can see how we can do without structs if we have arrays, >> >> >>>>>> especially >> >> >>>>>> with the extra payload at front. But wouldn't the general >> solution >> >> >>>>>> for >> >> >>>>>> structs be one that that allows new user data type defs for # >> >> >>>>>> types? >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett >> >> >>>>>> wrote: >> >> >>>>>>> >> >> >>>>>>> Some form of MutableStruct# with a known number of words and a >> >> >>>>>>> known >> >> >>>>>>> number of pointers is basically what Ryan Yates was suggesting >> >> >>>>>>> above, but >> >> >>>>>>> where the word counts were stored in the objects themselves. >> >> >>>>>>> >> >> >>>>>>> Given that it'd have a couple of words for those counts it'd >> >> >>>>>>> likely >> >> >>>>>>> want to be something we build in addition to MutVar# rather >> than a >> >> >>>>>>> replacement. >> >> >>>>>>> >> >> >>>>>>> On the other hand, if we had to fix those numbers and build >> info >> >> >>>>>>> tables that knew them, and typechecker support, for instance, >> it'd >> >> >>>>>>> get >> >> >>>>>>> rather invasive. >> >> >>>>>>> >> >> >>>>>>> Also, a number of things that we can do with the 'sized' >> versions >> >> >>>>>>> above, like working with evil unsized c-style arrays directly >> >> >>>>>>> inline at the >> >> >>>>>>> end of the structure cease to be possible, so it isn't even a >> pure >> >> >>>>>>> win if we >> >> >>>>>>> did the engineering effort. >> >> >>>>>>> >> >> >>>>>>> I think 90% of the needs I have are covered just by adding the >> one >> >> >>>>>>> primitive. The last 10% gets pretty invasive. >> >> >>>>>>> >> >> >>>>>>> -Edward >> >> >>>>>>> >> >> >>>>>>> On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton < >> rrnewton at gmail.com> >> >> >>>>>>> wrote: >> >> >>>>>>>> >> >> >>>>>>>> I like the possibility of a general solution for mutable >> structs >> >> >>>>>>>> (like Ed said), and I'm trying to fully understand why it's >> hard. >> >> >>>>>>>> >> >> >>>>>>>> So, we can't unpack MutVar into constructors because of object >> >> >>>>>>>> identity problems. But what about directly supporting an >> >> >>>>>>>> extensible set of >> >> >>>>>>>> unlifted MutStruct# objects, generalizing (and even replacing) >> >> >>>>>>>> MutVar#? That >> >> >>>>>>>> may be too much work, but is it problematic otherwise? >> >> >>>>>>>> >> >> >>>>>>>> Needless to say, this is also critical if we ever want best in >> >> >>>>>>>> class >> >> >>>>>>>> lockfree mutable structures, just like their Stm and >> sequential >> >> >>>>>>>> counterparts. >> >> >>>>>>>> >> >> >>>>>>>> On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones >> >> >>>>>>>> wrote: >> >> >>>>>>>>> >> >> >>>>>>>>> At the very least I'll take this email and turn it into a >> short >> >> >>>>>>>>> article. >> >> >>>>>>>>> >> >> >>>>>>>>> Yes, please do make it into a wiki page on the GHC Trac, and >> >> >>>>>>>>> maybe >> >> >>>>>>>>> make a ticket for it. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Thanks >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Simon >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> From: Edward Kmett [mailto:ekmett at gmail.com] >> >> >>>>>>>>> Sent: 27 August 2015 16:54 >> >> >>>>>>>>> To: Simon Peyton Jones >> >> >>>>>>>>> Cc: Manuel M T Chakravarty; Simon Marlow; ghc-devs >> >> >>>>>>>>> Subject: Re: ArrayArrays >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> An ArrayArray# is just an Array# with a modified invariant. >> It >> >> >>>>>>>>> points directly to other unlifted ArrayArray#'s or >> ByteArray#'s. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> While those live in #, they are garbage collected objects, so >> >> >>>>>>>>> this >> >> >>>>>>>>> all lives on the heap. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> They were added to make some of the DPH stuff fast when it >> has >> >> >>>>>>>>> to >> >> >>>>>>>>> deal with nested arrays. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> I'm currently abusing them as a placeholder for a better >> thing. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> The Problem >> >> >>>>>>>>> >> >> >>>>>>>>> ----------------- >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Consider the scenario where you write a classic doubly-linked >> >> >>>>>>>>> list >> >> >>>>>>>>> in Haskell. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Chasing from one DLL to the next requires following 3 >> pointers >> >> >>>>>>>>> on >> >> >>>>>>>>> the heap. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> DLL ~> IORef (Maybe DLL) ~> MutVar# RealWorld (Maybe DLL) ~> >> >> >>>>>>>>> Maybe >> >> >>>>>>>>> DLL ~> DLL >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> That is 3 levels of indirection. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> We can trim one by simply unpacking the IORef with >> >> >>>>>>>>> -funbox-strict-fields or UNPACK >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> We can trim another by adding a 'Nil' constructor for DLL and >> >> >>>>>>>>> worsening our representation. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> but now we're still stuck with a level of indirection >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> DLL ~> MutVar# RealWorld DLL ~> DLL >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> This means that every operation we perform on this structure >> >> >>>>>>>>> will >> >> >>>>>>>>> be about half of the speed of an implementation in most other >> >> >>>>>>>>> languages >> >> >>>>>>>>> assuming we're memory bound on loading things into cache! >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Making Progress >> >> >>>>>>>>> >> >> >>>>>>>>> ---------------------- >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> I have been working on a number of data structures where the >> >> >>>>>>>>> indirection of going from something in * out to an object in >> # >> >> >>>>>>>>> which >> >> >>>>>>>>> contains the real pointer to my target and coming back >> >> >>>>>>>>> effectively doubles >> >> >>>>>>>>> my runtime. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> We go out to the MutVar# because we are allowed to put the >> >> >>>>>>>>> MutVar# >> >> >>>>>>>>> onto the mutable list when we dirty it. There is a well >> defined >> >> >>>>>>>>> write-barrier. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> I could change out the representation to use >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> data DLL = DLL (MutableArray# RealWorld DLL) | Nil >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> I can just store two pointers in the MutableArray# every >> time, >> >> >>>>>>>>> but >> >> >>>>>>>>> this doesn't help _much_ directly. It has reduced the amount >> of >> >> >>>>>>>>> distinct >> >> >>>>>>>>> addresses in memory I touch on a walk of the DLL from 3 per >> >> >>>>>>>>> object to 2. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> I still have to go out to the heap from my DLL and get to the >> >> >>>>>>>>> array >> >> >>>>>>>>> object and then chase it to the next DLL and chase that to >> the >> >> >>>>>>>>> next array. I >> >> >>>>>>>>> do get my two pointers together in memory though. I'm paying >> for >> >> >>>>>>>>> a card >> >> >>>>>>>>> marking table as well, which I don't particularly need with >> just >> >> >>>>>>>>> two >> >> >>>>>>>>> pointers, but we can shed that with the "SmallMutableArray#" >> >> >>>>>>>>> machinery added >> >> >>>>>>>>> back in 7.10, which is just the old array code a a new data >> >> >>>>>>>>> type, which can >> >> >>>>>>>>> speed things up a bit when you don't have very big arrays: >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> But what if I wanted my object itself to live in # and have >> two >> >> >>>>>>>>> mutable fields and be able to share the sme write barrier? >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> An ArrayArray# points directly to other unlifted array types. >> >> >>>>>>>>> What >> >> >>>>>>>>> if we have one # -> * wrapper on the outside to deal with the >> >> >>>>>>>>> impedence >> >> >>>>>>>>> mismatch between the imperative world and Haskell, and then >> just >> >> >>>>>>>>> let the >> >> >>>>>>>>> ArrayArray#'s hold other arrayarrays. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> data DLL = DLL (MutableArrayArray# RealWorld) >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> now I need to make up a new Nil, which I can just make be a >> >> >>>>>>>>> special >> >> >>>>>>>>> MutableArrayArray# I allocate on program startup. I can even >> >> >>>>>>>>> abuse pattern >> >> >>>>>>>>> synonyms. Alternately I can exploit the internals further to >> >> >>>>>>>>> make this >> >> >>>>>>>>> cheaper. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Then I can use the readMutableArrayArray# and >> >> >>>>>>>>> writeMutableArrayArray# calls to directly access the >> preceding >> >> >>>>>>>>> and next >> >> >>>>>>>>> entry in the linked list. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> So now we have one DLL wrapper which just 'bootstraps me' >> into a >> >> >>>>>>>>> strict world, and everything there lives in #. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> next :: DLL -> IO DLL >> >> >>>>>>>>> >> >> >>>>>>>>> next (DLL m) = IO $ \s -> case readMutableArrayArray# s of >> >> >>>>>>>>> >> >> >>>>>>>>> (# s', n #) -> (# s', DLL n #) >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> It turns out GHC is quite happy to optimize all of that code >> to >> >> >>>>>>>>> keep things unboxed. The 'DLL' wrappers get removed pretty >> >> >>>>>>>>> easily when they >> >> >>>>>>>>> are known strict and you chain operations of this sort! >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Cleaning it Up >> >> >>>>>>>>> >> >> >>>>>>>>> ------------------ >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Now I have one outermost indirection pointing to an array >> that >> >> >>>>>>>>> points directly to other arrays. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> I'm stuck paying for a card marking table per object, but I >> can >> >> >>>>>>>>> fix >> >> >>>>>>>>> that by duplicating the code for MutableArrayArray# and >> using a >> >> >>>>>>>>> SmallMutableArray#. I can hack up primops that let me store a >> >> >>>>>>>>> mixture of >> >> >>>>>>>>> SmallMutableArray# fields and normal ones in the data >> structure. >> >> >>>>>>>>> Operationally, I can even do so by just unsafeCoercing the >> >> >>>>>>>>> existing >> >> >>>>>>>>> SmallMutableArray# primitives to change the kind of one of >> the >> >> >>>>>>>>> arguments it >> >> >>>>>>>>> takes. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> This is almost ideal, but not quite. I often have fields that >> >> >>>>>>>>> would >> >> >>>>>>>>> be best left unboxed. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> data DLLInt = DLL !Int !(IORef DLL) !(IORef DLL) | Nil >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> was able to unpack the Int, but we lost that. We can >> currently >> >> >>>>>>>>> at >> >> >>>>>>>>> best point one of the entries of the SmallMutableArray# at a >> >> >>>>>>>>> boxed or at a >> >> >>>>>>>>> MutableByteArray# for all of our misc. data and shove the >> int in >> >> >>>>>>>>> question in >> >> >>>>>>>>> there. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> e.g. if I were to implement a hash-array-mapped-trie I need >> to >> >> >>>>>>>>> store masks and administrivia as I walk down the tree. >> Having to >> >> >>>>>>>>> go off to >> >> >>>>>>>>> the side costs me the entire win from avoiding the first >> pointer >> >> >>>>>>>>> chase. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> But, if like Ryan suggested, we had a heap object we could >> >> >>>>>>>>> construct that had n words with unsafe access and m pointers >> to >> >> >>>>>>>>> other heap >> >> >>>>>>>>> objects, one that could put itself on the mutable list when >> any >> >> >>>>>>>>> of those >> >> >>>>>>>>> pointers changed then I could shed this last factor of two in >> >> >>>>>>>>> all >> >> >>>>>>>>> circumstances. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Prototype >> >> >>>>>>>>> >> >> >>>>>>>>> ------------- >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Over the last few days I've put together a small prototype >> >> >>>>>>>>> implementation with a few non-trivial imperative data >> structures >> >> >>>>>>>>> for things >> >> >>>>>>>>> like Tarjan's link-cut trees, the list labeling problem and >> >> >>>>>>>>> order-maintenance. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> https://github.com/ekmett/structs >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Notable bits: >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Data.Struct.Internal.LinkCut provides an implementation of >> >> >>>>>>>>> link-cut >> >> >>>>>>>>> trees in this style. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Data.Struct.Internal provides the rather horrifying guts that >> >> >>>>>>>>> make >> >> >>>>>>>>> it go fast. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Once compiled with -O or -O2, if you look at the core, almost >> >> >>>>>>>>> all >> >> >>>>>>>>> the references to the LinkCut or Object data constructor get >> >> >>>>>>>>> optimized away, >> >> >>>>>>>>> and we're left with beautiful strict code directly mutating >> out >> >> >>>>>>>>> underlying >> >> >>>>>>>>> representation. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> At the very least I'll take this email and turn it into a >> short >> >> >>>>>>>>> article. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> -Edward >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> On Thu, Aug 27, 2015 at 9:00 AM, Simon Peyton Jones >> >> >>>>>>>>> wrote: >> >> >>>>>>>>> >> >> >>>>>>>>> Just to say that I have no idea what is going on in this >> thread. >> >> >>>>>>>>> What is ArrayArray? What is the issue in general? Is there >> a >> >> >>>>>>>>> ticket? Is >> >> >>>>>>>>> there a wiki page? >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> If it?s important, an ab-initio wiki page + ticket would be a >> >> >>>>>>>>> good >> >> >>>>>>>>> thing. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Simon >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On >> Behalf >> >> >>>>>>>>> Of >> >> >>>>>>>>> Edward Kmett >> >> >>>>>>>>> Sent: 21 August 2015 05:25 >> >> >>>>>>>>> To: Manuel M T Chakravarty >> >> >>>>>>>>> Cc: Simon Marlow; ghc-devs >> >> >>>>>>>>> Subject: Re: ArrayArrays >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> When (ab)using them for this purpose, SmallArrayArray's >> would be >> >> >>>>>>>>> very handy as well. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Consider right now if I have something like an >> order-maintenance >> >> >>>>>>>>> structure I have: >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> data Upper s = Upper {-# UNPACK #-} !(MutableByteArray s) {-# >> >> >>>>>>>>> UNPACK #-} !(MutVar s (Upper s)) {-# UNPACK #-} !(MutVar s >> >> >>>>>>>>> (Upper s)) >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> data Lower s = Lower {-# UNPACK #-} !(MutVar s (Upper s)) {-# >> >> >>>>>>>>> UNPACK #-} !(MutableByteArray s) {-# UNPACK #-} !(MutVar s >> >> >>>>>>>>> (Lower s)) {-# >> >> >>>>>>>>> UNPACK #-} !(MutVar s (Lower s)) >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> The former contains, logically, a mutable integer and two >> >> >>>>>>>>> pointers, >> >> >>>>>>>>> one for forward and one for backwards. The latter is >> basically >> >> >>>>>>>>> the same >> >> >>>>>>>>> thing with a mutable reference up pointing at the structure >> >> >>>>>>>>> above. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> On the heap this is an object that points to a structure for >> the >> >> >>>>>>>>> bytearray, and points to another structure for each mutvar >> which >> >> >>>>>>>>> each point >> >> >>>>>>>>> to the other 'Upper' structure. So there is a level of >> >> >>>>>>>>> indirection smeared >> >> >>>>>>>>> over everything. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> So this is a pair of doubly linked lists with an upward link >> >> >>>>>>>>> from >> >> >>>>>>>>> the structure below to the structure above. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Converted into ArrayArray#s I'd get >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> data Upper s = Upper (MutableArrayArray# s) >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> w/ the first slot being a pointer to a MutableByteArray#, and >> >> >>>>>>>>> the >> >> >>>>>>>>> next 2 slots pointing to the previous and next previous >> objects, >> >> >>>>>>>>> represented >> >> >>>>>>>>> just as their MutableArrayArray#s. I can use >> >> >>>>>>>>> sameMutableArrayArray# on these >> >> >>>>>>>>> for object identity, which lets me check for the ends of the >> >> >>>>>>>>> lists by tying >> >> >>>>>>>>> things back on themselves. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> and below that >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> data Lower s = Lower (MutableArrayArray# s) >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> is similar, with an extra MutableArrayArray slot pointing up >> to >> >> >>>>>>>>> an >> >> >>>>>>>>> upper structure. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> I can then write a handful of combinators for getting out the >> >> >>>>>>>>> slots >> >> >>>>>>>>> in question, while it has gained a level of indirection >> between >> >> >>>>>>>>> the wrapper >> >> >>>>>>>>> to put it in * and the MutableArrayArray# s in #, that one >> can >> >> >>>>>>>>> be basically >> >> >>>>>>>>> erased by ghc. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Unlike before I don't have several separate objects on the >> heap >> >> >>>>>>>>> for >> >> >>>>>>>>> each thing. I only have 2 now. The MutableArrayArray# for the >> >> >>>>>>>>> object itself, >> >> >>>>>>>>> and the MutableByteArray# that it references to carry around >> the >> >> >>>>>>>>> mutable >> >> >>>>>>>>> int. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> The only pain points are >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> 1.) the aforementioned limitation that currently prevents me >> >> >>>>>>>>> from >> >> >>>>>>>>> stuffing normal boxed data through a SmallArray or Array >> into an >> >> >>>>>>>>> ArrayArray >> >> >>>>>>>>> leaving me in a little ghetto disconnected from the rest of >> >> >>>>>>>>> Haskell, >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> and >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> 2.) the lack of SmallArrayArray's, which could let us avoid >> the >> >> >>>>>>>>> card marking overhead. These objects are all small, 3-4 >> pointers >> >> >>>>>>>>> wide. Card >> >> >>>>>>>>> marking doesn't help. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Alternately I could just try to do really evil things and >> >> >>>>>>>>> convert >> >> >>>>>>>>> the whole mess to SmallArrays and then figure out how to >> >> >>>>>>>>> unsafeCoerce my way >> >> >>>>>>>>> to glory, stuffing the #'d references to the other arrays >> >> >>>>>>>>> directly into the >> >> >>>>>>>>> SmallArray as slots, removing the limitation we see here by >> >> >>>>>>>>> aping the >> >> >>>>>>>>> MutableArrayArray# s API, but that gets really really >> dangerous! >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> I'm pretty much willing to sacrifice almost anything on the >> >> >>>>>>>>> altar >> >> >>>>>>>>> of speed here, but I'd like to be able to let the GC move >> them >> >> >>>>>>>>> and collect >> >> >>>>>>>>> them which rules out simpler Ptr and Addr based solutions. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> -Edward >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> On Thu, Aug 20, 2015 at 9:01 PM, Manuel M T Chakravarty >> >> >>>>>>>>> wrote: >> >> >>>>>>>>> >> >> >>>>>>>>> That?s an interesting idea. >> >> >>>>>>>>> >> >> >>>>>>>>> Manuel >> >> >>>>>>>>> >> >> >>>>>>>>> > Edward Kmett : >> >> >>>>>>>>> >> >> >>>>>>>>> > >> >> >>>>>>>>> > Would it be possible to add unsafe primops to add Array# >> and >> >> >>>>>>>>> > SmallArray# entries to an ArrayArray#? The fact that the >> >> >>>>>>>>> > ArrayArray# entries >> >> >>>>>>>>> > are all directly unlifted avoiding a level of indirection >> for >> >> >>>>>>>>> > the containing >> >> >>>>>>>>> > structure is amazing, but I can only currently use it if my >> >> >>>>>>>>> > leaf level data >> >> >>>>>>>>> > can be 100% unboxed and distributed among ByteArray#s. >> It'd be >> >> >>>>>>>>> > nice to be >> >> >>>>>>>>> > able to have the ability to put SmallArray# a stuff down at >> >> >>>>>>>>> > the leaves to >> >> >>>>>>>>> > hold lifted contents. >> >> >>>>>>>>> > >> >> >>>>>>>>> > I accept fully that if I name the wrong type when I go to >> >> >>>>>>>>> > access >> >> >>>>>>>>> > one of the fields it'll lie to me, but I suppose it'd do >> that >> >> >>>>>>>>> > if i tried to >> >> >>>>>>>>> > use one of the members that held a nested ArrayArray# as a >> >> >>>>>>>>> > ByteArray# >> >> >>>>>>>>> > anyways, so it isn't like there is a safety story >> preventing >> >> >>>>>>>>> > this. >> >> >>>>>>>>> > >> >> >>>>>>>>> > I've been hunting for ways to try to kill the indirection >> >> >>>>>>>>> > problems I get with Haskell and mutable structures, and I >> >> >>>>>>>>> > could shoehorn a >> >> >>>>>>>>> > number of them into ArrayArrays if this worked. >> >> >>>>>>>>> > >> >> >>>>>>>>> > Right now I'm stuck paying for 2 or 3 levels of unnecessary >> >> >>>>>>>>> > indirection compared to c/java and this could reduce that >> pain >> >> >>>>>>>>> > to just 1 >> >> >>>>>>>>> > level of unnecessary indirection. >> >> >>>>>>>>> > >> >> >>>>>>>>> > -Edward >> >> >>>>>>>>> >> >> >>>>>>>>> > _______________________________________________ >> >> >>>>>>>>> > ghc-devs mailing list >> >> >>>>>>>>> > ghc-devs at haskell.org >> >> >>>>>>>>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> _______________________________________________ >> >> >>>>>>>>> ghc-devs mailing list >> >> >>>>>>>>> ghc-devs at haskell.org >> >> >>>>>>>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> >> >>>>>>> >> >> >>>>>>> >> >> >>>>> >> >> >>> >> >> >> >> >> > >> >> > >> >> > _______________________________________________ >> >> > ghc-devs mailing list >> >> > ghc-devs at haskell.org >> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> >> > >> > >> > >> >> >> >> >> >> >> _______________________________________________ >> ghc-devs mailing list >> ghc-devs at haskell.org >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> >> >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ekmett at gmail.com Mon Sep 7 20:31:57 2015 From: ekmett at gmail.com (Edward Kmett) Date: Mon, 7 Sep 2015 16:31:57 -0400 Subject: ArrayArrays In-Reply-To: References: <4DACFC45-0E7E-4B3F-8435-5365EC3F7749@cse.unsw.edu.au> <65158505c7be41afad85374d246b7350@DB4PR30MB030.064d.mgd.msft.net> <2FCB6298-A4FF-4F7B-8BF8-4880BB3154AB@gmail.com> <325b043066bb48a79f254b75ba9753ee@DB4PR30MB030.064d.mgd.msft.net> Message-ID: Indeed. I can CAS today with appropriately coerced primitives. -Edward On Mon, Sep 7, 2015 at 4:27 PM, Ryan Newton wrote: > Ah, incidentally that introduces an interesting difference between > atomicModify and CAS. CAS should be able to work on mutable locations in > that subset of # that are represented by a gcptr, whereas Edward pointed > out that atomicModify cannot. > > (Indeed, to use lock-free algorithms with these new unboxed mutable > structures we'll need CAS on the slots.) > > On Mon, Sep 7, 2015 at 4:16 PM, Edward Kmett wrote: > >> I had a brief discussion with Richard during the Haskell Symposium about >> how we might be able to let parametricity help a bit in reducing the space >> of necessarily primops to a slightly more manageable level. >> >> Notably, it'd be interesting to explore the ability to allow >> parametricity over the portion of # that is just a gcptr. >> >> We could do this if the levity polymorphism machinery was tweaked a bit. >> You could envision the ability to abstract over things in both * and the >> subset of # that are represented by a gcptr, then modifying the existing >> array primitives to be parametric in that choice of levity for their >> argument so long as it was of a "heap object" levity. >> >> This could make the menagerie of ways to pack >> {Small}{Mutable}Array{Array}# references into a >> {Small}{Mutable}Array{Array}#' actually typecheck soundly, reducing the >> need for folks to descend into the use of the more evil structure >> primitives we're talking about, and letting us keep a few more principles >> around us. >> >> Then in the cases like `atomicModifyMutVar#` where it needs to actually >> be in * rather than just a gcptr, due to the constructed field selectors it >> introduces on the heap then we could keep the existing less polymorphic >> type. >> >> -Edward >> >> On Mon, Sep 7, 2015 at 9:59 AM, Simon Peyton Jones > > wrote: >> >>> It was fun to meet and discuss this. >>> >>> >>> >>> Did someone volunteer to write a wiki page that describes the proposed >>> design? And, I earnestly hope, also describes the menagerie of currently >>> available array types and primops so that users can have some chance of >>> picking the right one?! >>> >>> >>> >>> Thanks >>> >>> >>> >>> Simon >>> >>> >>> >>> *From:* ghc-devs [mailto:ghc-devs-bounces at haskell.org] *On Behalf Of *Ryan >>> Newton >>> *Sent:* 31 August 2015 23:11 >>> *To:* Edward Kmett; Johan Tibell >>> *Cc:* Simon Marlow; Manuel M T Chakravarty; Chao-Hong Chen; ghc-devs; >>> Ryan Scott; Ryan Yates >>> *Subject:* Re: ArrayArrays >>> >>> >>> >>> Dear Edward, Ryan Yates, and other interested parties -- >>> >>> >>> >>> So when should we meet up about this? >>> >>> >>> >>> May I propose the Tues afternoon break for everyone at ICFP who is >>> interested in this topic? We can meet out in the coffee area and >>> congregate around Edward Kmett, who is tall and should be easy to find ;-). >>> >>> >>> >>> I think Ryan is going to show us how to use his new primops for combined >>> array + other fields in one heap object? >>> >>> >>> >>> On Sat, Aug 29, 2015 at 9:24 PM Edward Kmett wrote: >>> >>> Without a custom primitive it doesn't help much there, you have to store >>> the indirection to the mask. >>> >>> >>> >>> With a custom primitive it should cut the on heap root-to-leaf path of >>> everything in the HAMT in half. A shorter HashMap was actually one of the >>> motivating factors for me doing this. It is rather astoundingly difficult >>> to beat the performance of HashMap, so I had to start cheating pretty >>> badly. ;) >>> >>> >>> >>> -Edward >>> >>> >>> >>> On Sat, Aug 29, 2015 at 5:45 PM, Johan Tibell >>> wrote: >>> >>> I'd also be interested to chat at ICFP to see if I can use this for my >>> HAMT implementation. >>> >>> >>> >>> On Sat, Aug 29, 2015 at 3:07 PM, Edward Kmett wrote: >>> >>> Sounds good to me. Right now I'm just hacking up composable accessors >>> for "typed slots" in a fairly lens-like fashion, and treating the set of >>> slots I define and the 'new' function I build for the data type as its API, >>> and build atop that. This could eventually graduate to template-haskell, >>> but I'm not entirely satisfied with the solution I have. I currently >>> distinguish between what I'm calling "slots" (things that point directly to >>> another SmallMutableArrayArray# sans wrapper) and "fields" which point >>> directly to the usual Haskell data types because unifying the two notions >>> meant that I couldn't lift some coercions out "far enough" to make them >>> vanish. >>> >>> >>> >>> I'll be happy to run through my current working set of issues in person >>> and -- as things get nailed down further -- in a longer lived medium than >>> in personal conversations. ;) >>> >>> >>> >>> -Edward >>> >>> >>> >>> On Sat, Aug 29, 2015 at 7:59 AM, Ryan Newton wrote: >>> >>> I'd also love to meet up at ICFP and discuss this. I think the array >>> primops plus a TH layer that lets (ab)use them many times without too much >>> marginal cost sounds great. And I'd like to learn how we could be either >>> early users of, or help with, this infrastructure. >>> >>> >>> >>> CC'ing in Ryan Scot and Omer Agacan who may also be interested in >>> dropping in on such discussions @ICFP, and Chao-Hong Chen, a Ph.D. student >>> who is currently working on concurrent data structures in Haskell, but will >>> not be at ICFP. >>> >>> >>> >>> >>> >>> On Fri, Aug 28, 2015 at 7:47 PM, Ryan Yates wrote: >>> >>> I completely agree. I would love to spend some time during ICFP and >>> friends talking about what it could look like. My small array for STM >>> changes for the RTS can be seen here [1]. It is on a branch somewhere >>> between 7.8 and 7.10 and includes irrelevant STM bits and some >>> confusing naming choices (sorry), but should cover all the details >>> needed to implement it for a non-STM context. The biggest surprise >>> for me was following small array too closely and having a word/byte >>> offset miss-match [2]. >>> >>> [1]: >>> https://github.com/fryguybob/ghc/compare/ghc-htm-bloom...fryguybob:ghc-htm-mut >>> [2]: https://ghc.haskell.org/trac/ghc/ticket/10413 >>> >>> Ryan >>> >>> >>> On Fri, Aug 28, 2015 at 10:09 PM, Edward Kmett wrote: >>> > I'd love to have that last 10%, but its a lot of work to get there and >>> more >>> > importantly I don't know quite what it should look like. >>> > >>> > On the other hand, I do have a pretty good idea of how the primitives >>> above >>> > could be banged out and tested in a long evening, well in time for >>> 7.12. And >>> > as noted earlier, those remain useful even if a nicer typed version >>> with an >>> > extra level of indirection to the sizes is built up after. >>> > >>> > The rest sounds like a good graduate student project for someone who >>> has >>> > graduate students lying around. Maybe somebody at Indiana University >>> who has >>> > an interest in type theory and parallelism can find us one. =) >>> > >>> > -Edward >>> > >>> > On Fri, Aug 28, 2015 at 8:48 PM, Ryan Yates >>> wrote: >>> >> >>> >> I think from my perspective, the motivation for getting the type >>> >> checker involved is primarily bringing this to the level where users >>> >> could be expected to build these structures. it is reasonable to >>> >> think that there are people who want to use STM (a context with >>> >> mutation already) to implement a straight forward data structure that >>> >> avoids extra indirection penalty. There should be some places where >>> >> knowing that things are field accesses rather then array indexing >>> >> could be helpful, but I think GHC is good right now about handling >>> >> constant offsets. In my code I don't do any bounds checking as I know >>> >> I will only be accessing my arrays with constant indexes. I make >>> >> wrappers for each field access and leave all the unsafe stuff in >>> >> there. When things go wrong though, the compiler is no help. Maybe >>> >> template Haskell that generates the appropriate wrappers is the right >>> >> direction to go. >>> >> There is another benefit for me when working with these as arrays in >>> >> that it is quite simple and direct (given the hoops already jumped >>> >> through) to play with alignment. I can ensure two pointers are never >>> >> on the same cache-line by just spacing things out in the array. >>> >> >>> >> On Fri, Aug 28, 2015 at 7:33 PM, Edward Kmett >>> wrote: >>> >> > They just segfault at this level. ;) >>> >> > >>> >> > Sent from my iPhone >>> >> > >>> >> > On Aug 28, 2015, at 7:25 PM, Ryan Newton >>> wrote: >>> >> > >>> >> > You presumably also save a bounds check on reads by hard-coding the >>> >> > sizes? >>> >> > >>> >> > On Fri, Aug 28, 2015 at 3:39 PM, Edward Kmett >>> wrote: >>> >> >> >>> >> >> Also there are 4 different "things" here, basically depending on >>> two >>> >> >> independent questions: >>> >> >> >>> >> >> a.) if you want to shove the sizes into the info table, and >>> >> >> b.) if you want cardmarking. >>> >> >> >>> >> >> Versions with/without cardmarking for different sizes can be done >>> >> >> pretty >>> >> >> easily, but as noted, the infotable variants are pretty invasive. >>> >> >> >>> >> >> -Edward >>> >> >> >>> >> >> On Fri, Aug 28, 2015 at 6:36 PM, Edward Kmett >>> wrote: >>> >> >>> >>> >> >>> Well, on the plus side you'd save 16 bytes per object, which adds >>> up >>> >> >>> if >>> >> >>> they were small enough and there are enough of them. You get a bit >>> >> >>> better >>> >> >>> locality of reference in terms of what fits in the first cache >>> line of >>> >> >>> them. >>> >> >>> >>> >> >>> -Edward >>> >> >>> >>> >> >>> On Fri, Aug 28, 2015 at 6:14 PM, Ryan Newton >>> >> >>> wrote: >>> >> >>>> >>> >> >>>> Yes. And for the short term I can imagine places we will settle >>> with >>> >> >>>> arrays even if it means tracking lengths unnecessarily and >>> >> >>>> unsafeCoercing >>> >> >>>> pointers whose types don't actually match their siblings. >>> >> >>>> >>> >> >>>> Is there anything to recommend the hacks mentioned for fixed >>> sized >>> >> >>>> array >>> >> >>>> objects *other* than using them to fake structs? (Much to >>> >> >>>> derecommend, as >>> >> >>>> you mentioned!) >>> >> >>>> >>> >> >>>> On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett >>> >> >>>> wrote: >>> >> >>>>> >>> >> >>>>> I think both are useful, but the one you suggest requires a lot >>> more >>> >> >>>>> plumbing and doesn't subsume all of the usecases of the other. >>> >> >>>>> >>> >> >>>>> -Edward >>> >> >>>>> >>> >> >>>>> On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton < >>> rrnewton at gmail.com> >>> >> >>>>> wrote: >>> >> >>>>>> >>> >> >>>>>> So that primitive is an array like thing (Same pointed type, >>> >> >>>>>> unbounded >>> >> >>>>>> length) with extra payload. >>> >> >>>>>> >>> >> >>>>>> I can see how we can do without structs if we have arrays, >>> >> >>>>>> especially >>> >> >>>>>> with the extra payload at front. But wouldn't the general >>> solution >>> >> >>>>>> for >>> >> >>>>>> structs be one that that allows new user data type defs for # >>> >> >>>>>> types? >>> >> >>>>>> >>> >> >>>>>> >>> >> >>>>>> >>> >> >>>>>> On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett >> > >>> >> >>>>>> wrote: >>> >> >>>>>>> >>> >> >>>>>>> Some form of MutableStruct# with a known number of words and a >>> >> >>>>>>> known >>> >> >>>>>>> number of pointers is basically what Ryan Yates was suggesting >>> >> >>>>>>> above, but >>> >> >>>>>>> where the word counts were stored in the objects themselves. >>> >> >>>>>>> >>> >> >>>>>>> Given that it'd have a couple of words for those counts it'd >>> >> >>>>>>> likely >>> >> >>>>>>> want to be something we build in addition to MutVar# rather >>> than a >>> >> >>>>>>> replacement. >>> >> >>>>>>> >>> >> >>>>>>> On the other hand, if we had to fix those numbers and build >>> info >>> >> >>>>>>> tables that knew them, and typechecker support, for instance, >>> it'd >>> >> >>>>>>> get >>> >> >>>>>>> rather invasive. >>> >> >>>>>>> >>> >> >>>>>>> Also, a number of things that we can do with the 'sized' >>> versions >>> >> >>>>>>> above, like working with evil unsized c-style arrays directly >>> >> >>>>>>> inline at the >>> >> >>>>>>> end of the structure cease to be possible, so it isn't even a >>> pure >>> >> >>>>>>> win if we >>> >> >>>>>>> did the engineering effort. >>> >> >>>>>>> >>> >> >>>>>>> I think 90% of the needs I have are covered just by adding >>> the one >>> >> >>>>>>> primitive. The last 10% gets pretty invasive. >>> >> >>>>>>> >>> >> >>>>>>> -Edward >>> >> >>>>>>> >>> >> >>>>>>> On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton < >>> rrnewton at gmail.com> >>> >> >>>>>>> wrote: >>> >> >>>>>>>> >>> >> >>>>>>>> I like the possibility of a general solution for mutable >>> structs >>> >> >>>>>>>> (like Ed said), and I'm trying to fully understand why it's >>> hard. >>> >> >>>>>>>> >>> >> >>>>>>>> So, we can't unpack MutVar into constructors because of >>> object >>> >> >>>>>>>> identity problems. But what about directly supporting an >>> >> >>>>>>>> extensible set of >>> >> >>>>>>>> unlifted MutStruct# objects, generalizing (and even >>> replacing) >>> >> >>>>>>>> MutVar#? That >>> >> >>>>>>>> may be too much work, but is it problematic otherwise? >>> >> >>>>>>>> >>> >> >>>>>>>> Needless to say, this is also critical if we ever want best >>> in >>> >> >>>>>>>> class >>> >> >>>>>>>> lockfree mutable structures, just like their Stm and >>> sequential >>> >> >>>>>>>> counterparts. >>> >> >>>>>>>> >>> >> >>>>>>>> On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones >>> >> >>>>>>>> wrote: >>> >> >>>>>>>>> >>> >> >>>>>>>>> At the very least I'll take this email and turn it into a >>> short >>> >> >>>>>>>>> article. >>> >> >>>>>>>>> >>> >> >>>>>>>>> Yes, please do make it into a wiki page on the GHC Trac, and >>> >> >>>>>>>>> maybe >>> >> >>>>>>>>> make a ticket for it. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> Thanks >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> Simon >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> From: Edward Kmett [mailto:ekmett at gmail.com] >>> >> >>>>>>>>> Sent: 27 August 2015 16:54 >>> >> >>>>>>>>> To: Simon Peyton Jones >>> >> >>>>>>>>> Cc: Manuel M T Chakravarty; Simon Marlow; ghc-devs >>> >> >>>>>>>>> Subject: Re: ArrayArrays >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> An ArrayArray# is just an Array# with a modified invariant. >>> It >>> >> >>>>>>>>> points directly to other unlifted ArrayArray#'s or >>> ByteArray#'s. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> While those live in #, they are garbage collected objects, >>> so >>> >> >>>>>>>>> this >>> >> >>>>>>>>> all lives on the heap. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> They were added to make some of the DPH stuff fast when it >>> has >>> >> >>>>>>>>> to >>> >> >>>>>>>>> deal with nested arrays. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> I'm currently abusing them as a placeholder for a better >>> thing. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> The Problem >>> >> >>>>>>>>> >>> >> >>>>>>>>> ----------------- >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> Consider the scenario where you write a classic >>> doubly-linked >>> >> >>>>>>>>> list >>> >> >>>>>>>>> in Haskell. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> Chasing from one DLL to the next requires following 3 >>> pointers >>> >> >>>>>>>>> on >>> >> >>>>>>>>> the heap. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> DLL ~> IORef (Maybe DLL) ~> MutVar# RealWorld (Maybe DLL) ~> >>> >> >>>>>>>>> Maybe >>> >> >>>>>>>>> DLL ~> DLL >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> That is 3 levels of indirection. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> We can trim one by simply unpacking the IORef with >>> >> >>>>>>>>> -funbox-strict-fields or UNPACK >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> We can trim another by adding a 'Nil' constructor for DLL >>> and >>> >> >>>>>>>>> worsening our representation. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> but now we're still stuck with a level of indirection >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> DLL ~> MutVar# RealWorld DLL ~> DLL >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> This means that every operation we perform on this structure >>> >> >>>>>>>>> will >>> >> >>>>>>>>> be about half of the speed of an implementation in most >>> other >>> >> >>>>>>>>> languages >>> >> >>>>>>>>> assuming we're memory bound on loading things into cache! >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> Making Progress >>> >> >>>>>>>>> >>> >> >>>>>>>>> ---------------------- >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> I have been working on a number of data structures where the >>> >> >>>>>>>>> indirection of going from something in * out to an object >>> in # >>> >> >>>>>>>>> which >>> >> >>>>>>>>> contains the real pointer to my target and coming back >>> >> >>>>>>>>> effectively doubles >>> >> >>>>>>>>> my runtime. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> We go out to the MutVar# because we are allowed to put the >>> >> >>>>>>>>> MutVar# >>> >> >>>>>>>>> onto the mutable list when we dirty it. There is a well >>> defined >>> >> >>>>>>>>> write-barrier. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> I could change out the representation to use >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> data DLL = DLL (MutableArray# RealWorld DLL) | Nil >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> I can just store two pointers in the MutableArray# every >>> time, >>> >> >>>>>>>>> but >>> >> >>>>>>>>> this doesn't help _much_ directly. It has reduced the >>> amount of >>> >> >>>>>>>>> distinct >>> >> >>>>>>>>> addresses in memory I touch on a walk of the DLL from 3 per >>> >> >>>>>>>>> object to 2. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> I still have to go out to the heap from my DLL and get to >>> the >>> >> >>>>>>>>> array >>> >> >>>>>>>>> object and then chase it to the next DLL and chase that to >>> the >>> >> >>>>>>>>> next array. I >>> >> >>>>>>>>> do get my two pointers together in memory though. I'm >>> paying for >>> >> >>>>>>>>> a card >>> >> >>>>>>>>> marking table as well, which I don't particularly need with >>> just >>> >> >>>>>>>>> two >>> >> >>>>>>>>> pointers, but we can shed that with the "SmallMutableArray#" >>> >> >>>>>>>>> machinery added >>> >> >>>>>>>>> back in 7.10, which is just the old array code a a new data >>> >> >>>>>>>>> type, which can >>> >> >>>>>>>>> speed things up a bit when you don't have very big arrays: >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> But what if I wanted my object itself to live in # and have >>> two >>> >> >>>>>>>>> mutable fields and be able to share the sme write barrier? >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> An ArrayArray# points directly to other unlifted array >>> types. >>> >> >>>>>>>>> What >>> >> >>>>>>>>> if we have one # -> * wrapper on the outside to deal with >>> the >>> >> >>>>>>>>> impedence >>> >> >>>>>>>>> mismatch between the imperative world and Haskell, and then >>> just >>> >> >>>>>>>>> let the >>> >> >>>>>>>>> ArrayArray#'s hold other arrayarrays. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> data DLL = DLL (MutableArrayArray# RealWorld) >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> now I need to make up a new Nil, which I can just make be a >>> >> >>>>>>>>> special >>> >> >>>>>>>>> MutableArrayArray# I allocate on program startup. I can even >>> >> >>>>>>>>> abuse pattern >>> >> >>>>>>>>> synonyms. Alternately I can exploit the internals further to >>> >> >>>>>>>>> make this >>> >> >>>>>>>>> cheaper. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> Then I can use the readMutableArrayArray# and >>> >> >>>>>>>>> writeMutableArrayArray# calls to directly access the >>> preceding >>> >> >>>>>>>>> and next >>> >> >>>>>>>>> entry in the linked list. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> So now we have one DLL wrapper which just 'bootstraps me' >>> into a >>> >> >>>>>>>>> strict world, and everything there lives in #. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> next :: DLL -> IO DLL >>> >> >>>>>>>>> >>> >> >>>>>>>>> next (DLL m) = IO $ \s -> case readMutableArrayArray# s of >>> >> >>>>>>>>> >>> >> >>>>>>>>> (# s', n #) -> (# s', DLL n #) >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> It turns out GHC is quite happy to optimize all of that >>> code to >>> >> >>>>>>>>> keep things unboxed. The 'DLL' wrappers get removed pretty >>> >> >>>>>>>>> easily when they >>> >> >>>>>>>>> are known strict and you chain operations of this sort! >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> Cleaning it Up >>> >> >>>>>>>>> >>> >> >>>>>>>>> ------------------ >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> Now I have one outermost indirection pointing to an array >>> that >>> >> >>>>>>>>> points directly to other arrays. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> I'm stuck paying for a card marking table per object, but I >>> can >>> >> >>>>>>>>> fix >>> >> >>>>>>>>> that by duplicating the code for MutableArrayArray# and >>> using a >>> >> >>>>>>>>> SmallMutableArray#. I can hack up primops that let me store >>> a >>> >> >>>>>>>>> mixture of >>> >> >>>>>>>>> SmallMutableArray# fields and normal ones in the data >>> structure. >>> >> >>>>>>>>> Operationally, I can even do so by just unsafeCoercing the >>> >> >>>>>>>>> existing >>> >> >>>>>>>>> SmallMutableArray# primitives to change the kind of one of >>> the >>> >> >>>>>>>>> arguments it >>> >> >>>>>>>>> takes. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> This is almost ideal, but not quite. I often have fields >>> that >>> >> >>>>>>>>> would >>> >> >>>>>>>>> be best left unboxed. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> data DLLInt = DLL !Int !(IORef DLL) !(IORef DLL) | Nil >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> was able to unpack the Int, but we lost that. We can >>> currently >>> >> >>>>>>>>> at >>> >> >>>>>>>>> best point one of the entries of the SmallMutableArray# at a >>> >> >>>>>>>>> boxed or at a >>> >> >>>>>>>>> MutableByteArray# for all of our misc. data and shove the >>> int in >>> >> >>>>>>>>> question in >>> >> >>>>>>>>> there. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> e.g. if I were to implement a hash-array-mapped-trie I need >>> to >>> >> >>>>>>>>> store masks and administrivia as I walk down the tree. >>> Having to >>> >> >>>>>>>>> go off to >>> >> >>>>>>>>> the side costs me the entire win from avoiding the first >>> pointer >>> >> >>>>>>>>> chase. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> But, if like Ryan suggested, we had a heap object we could >>> >> >>>>>>>>> construct that had n words with unsafe access and m >>> pointers to >>> >> >>>>>>>>> other heap >>> >> >>>>>>>>> objects, one that could put itself on the mutable list when >>> any >>> >> >>>>>>>>> of those >>> >> >>>>>>>>> pointers changed then I could shed this last factor of two >>> in >>> >> >>>>>>>>> all >>> >> >>>>>>>>> circumstances. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> Prototype >>> >> >>>>>>>>> >>> >> >>>>>>>>> ------------- >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> Over the last few days I've put together a small prototype >>> >> >>>>>>>>> implementation with a few non-trivial imperative data >>> structures >>> >> >>>>>>>>> for things >>> >> >>>>>>>>> like Tarjan's link-cut trees, the list labeling problem and >>> >> >>>>>>>>> order-maintenance. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> https://github.com/ekmett/structs >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> Notable bits: >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> Data.Struct.Internal.LinkCut provides an implementation of >>> >> >>>>>>>>> link-cut >>> >> >>>>>>>>> trees in this style. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> Data.Struct.Internal provides the rather horrifying guts >>> that >>> >> >>>>>>>>> make >>> >> >>>>>>>>> it go fast. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> Once compiled with -O or -O2, if you look at the core, >>> almost >>> >> >>>>>>>>> all >>> >> >>>>>>>>> the references to the LinkCut or Object data constructor get >>> >> >>>>>>>>> optimized away, >>> >> >>>>>>>>> and we're left with beautiful strict code directly mutating >>> out >>> >> >>>>>>>>> underlying >>> >> >>>>>>>>> representation. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> At the very least I'll take this email and turn it into a >>> short >>> >> >>>>>>>>> article. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> -Edward >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> On Thu, Aug 27, 2015 at 9:00 AM, Simon Peyton Jones >>> >> >>>>>>>>> wrote: >>> >> >>>>>>>>> >>> >> >>>>>>>>> Just to say that I have no idea what is going on in this >>> thread. >>> >> >>>>>>>>> What is ArrayArray? What is the issue in general? Is >>> there a >>> >> >>>>>>>>> ticket? Is >>> >> >>>>>>>>> there a wiki page? >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> If it?s important, an ab-initio wiki page + ticket would be >>> a >>> >> >>>>>>>>> good >>> >> >>>>>>>>> thing. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> Simon >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On >>> Behalf >>> >> >>>>>>>>> Of >>> >> >>>>>>>>> Edward Kmett >>> >> >>>>>>>>> Sent: 21 August 2015 05:25 >>> >> >>>>>>>>> To: Manuel M T Chakravarty >>> >> >>>>>>>>> Cc: Simon Marlow; ghc-devs >>> >> >>>>>>>>> Subject: Re: ArrayArrays >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> When (ab)using them for this purpose, SmallArrayArray's >>> would be >>> >> >>>>>>>>> very handy as well. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> Consider right now if I have something like an >>> order-maintenance >>> >> >>>>>>>>> structure I have: >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> data Upper s = Upper {-# UNPACK #-} !(MutableByteArray s) >>> {-# >>> >> >>>>>>>>> UNPACK #-} !(MutVar s (Upper s)) {-# UNPACK #-} !(MutVar s >>> >> >>>>>>>>> (Upper s)) >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> data Lower s = Lower {-# UNPACK #-} !(MutVar s (Upper s)) >>> {-# >>> >> >>>>>>>>> UNPACK #-} !(MutableByteArray s) {-# UNPACK #-} !(MutVar s >>> >> >>>>>>>>> (Lower s)) {-# >>> >> >>>>>>>>> UNPACK #-} !(MutVar s (Lower s)) >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> The former contains, logically, a mutable integer and two >>> >> >>>>>>>>> pointers, >>> >> >>>>>>>>> one for forward and one for backwards. The latter is >>> basically >>> >> >>>>>>>>> the same >>> >> >>>>>>>>> thing with a mutable reference up pointing at the structure >>> >> >>>>>>>>> above. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> On the heap this is an object that points to a structure >>> for the >>> >> >>>>>>>>> bytearray, and points to another structure for each mutvar >>> which >>> >> >>>>>>>>> each point >>> >> >>>>>>>>> to the other 'Upper' structure. So there is a level of >>> >> >>>>>>>>> indirection smeared >>> >> >>>>>>>>> over everything. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> So this is a pair of doubly linked lists with an upward link >>> >> >>>>>>>>> from >>> >> >>>>>>>>> the structure below to the structure above. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> Converted into ArrayArray#s I'd get >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> data Upper s = Upper (MutableArrayArray# s) >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> w/ the first slot being a pointer to a MutableByteArray#, >>> and >>> >> >>>>>>>>> the >>> >> >>>>>>>>> next 2 slots pointing to the previous and next previous >>> objects, >>> >> >>>>>>>>> represented >>> >> >>>>>>>>> just as their MutableArrayArray#s. I can use >>> >> >>>>>>>>> sameMutableArrayArray# on these >>> >> >>>>>>>>> for object identity, which lets me check for the ends of the >>> >> >>>>>>>>> lists by tying >>> >> >>>>>>>>> things back on themselves. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> and below that >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> data Lower s = Lower (MutableArrayArray# s) >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> is similar, with an extra MutableArrayArray slot pointing >>> up to >>> >> >>>>>>>>> an >>> >> >>>>>>>>> upper structure. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> I can then write a handful of combinators for getting out >>> the >>> >> >>>>>>>>> slots >>> >> >>>>>>>>> in question, while it has gained a level of indirection >>> between >>> >> >>>>>>>>> the wrapper >>> >> >>>>>>>>> to put it in * and the MutableArrayArray# s in #, that one >>> can >>> >> >>>>>>>>> be basically >>> >> >>>>>>>>> erased by ghc. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> Unlike before I don't have several separate objects on the >>> heap >>> >> >>>>>>>>> for >>> >> >>>>>>>>> each thing. I only have 2 now. The MutableArrayArray# for >>> the >>> >> >>>>>>>>> object itself, >>> >> >>>>>>>>> and the MutableByteArray# that it references to carry >>> around the >>> >> >>>>>>>>> mutable >>> >> >>>>>>>>> int. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> The only pain points are >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> 1.) the aforementioned limitation that currently prevents me >>> >> >>>>>>>>> from >>> >> >>>>>>>>> stuffing normal boxed data through a SmallArray or Array >>> into an >>> >> >>>>>>>>> ArrayArray >>> >> >>>>>>>>> leaving me in a little ghetto disconnected from the rest of >>> >> >>>>>>>>> Haskell, >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> and >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> 2.) the lack of SmallArrayArray's, which could let us avoid >>> the >>> >> >>>>>>>>> card marking overhead. These objects are all small, 3-4 >>> pointers >>> >> >>>>>>>>> wide. Card >>> >> >>>>>>>>> marking doesn't help. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> Alternately I could just try to do really evil things and >>> >> >>>>>>>>> convert >>> >> >>>>>>>>> the whole mess to SmallArrays and then figure out how to >>> >> >>>>>>>>> unsafeCoerce my way >>> >> >>>>>>>>> to glory, stuffing the #'d references to the other arrays >>> >> >>>>>>>>> directly into the >>> >> >>>>>>>>> SmallArray as slots, removing the limitation we see here by >>> >> >>>>>>>>> aping the >>> >> >>>>>>>>> MutableArrayArray# s API, but that gets really really >>> dangerous! >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> I'm pretty much willing to sacrifice almost anything on the >>> >> >>>>>>>>> altar >>> >> >>>>>>>>> of speed here, but I'd like to be able to let the GC move >>> them >>> >> >>>>>>>>> and collect >>> >> >>>>>>>>> them which rules out simpler Ptr and Addr based solutions. >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> -Edward >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> On Thu, Aug 20, 2015 at 9:01 PM, Manuel M T Chakravarty >>> >> >>>>>>>>> wrote: >>> >> >>>>>>>>> >>> >> >>>>>>>>> That?s an interesting idea. >>> >> >>>>>>>>> >>> >> >>>>>>>>> Manuel >>> >> >>>>>>>>> >>> >> >>>>>>>>> > Edward Kmett : >>> >> >>>>>>>>> >>> >> >>>>>>>>> > >>> >> >>>>>>>>> > Would it be possible to add unsafe primops to add Array# >>> and >>> >> >>>>>>>>> > SmallArray# entries to an ArrayArray#? The fact that the >>> >> >>>>>>>>> > ArrayArray# entries >>> >> >>>>>>>>> > are all directly unlifted avoiding a level of indirection >>> for >>> >> >>>>>>>>> > the containing >>> >> >>>>>>>>> > structure is amazing, but I can only currently use it if >>> my >>> >> >>>>>>>>> > leaf level data >>> >> >>>>>>>>> > can be 100% unboxed and distributed among ByteArray#s. >>> It'd be >>> >> >>>>>>>>> > nice to be >>> >> >>>>>>>>> > able to have the ability to put SmallArray# a stuff down >>> at >>> >> >>>>>>>>> > the leaves to >>> >> >>>>>>>>> > hold lifted contents. >>> >> >>>>>>>>> > >>> >> >>>>>>>>> > I accept fully that if I name the wrong type when I go to >>> >> >>>>>>>>> > access >>> >> >>>>>>>>> > one of the fields it'll lie to me, but I suppose it'd do >>> that >>> >> >>>>>>>>> > if i tried to >>> >> >>>>>>>>> > use one of the members that held a nested ArrayArray# as a >>> >> >>>>>>>>> > ByteArray# >>> >> >>>>>>>>> > anyways, so it isn't like there is a safety story >>> preventing >>> >> >>>>>>>>> > this. >>> >> >>>>>>>>> > >>> >> >>>>>>>>> > I've been hunting for ways to try to kill the indirection >>> >> >>>>>>>>> > problems I get with Haskell and mutable structures, and I >>> >> >>>>>>>>> > could shoehorn a >>> >> >>>>>>>>> > number of them into ArrayArrays if this worked. >>> >> >>>>>>>>> > >>> >> >>>>>>>>> > Right now I'm stuck paying for 2 or 3 levels of >>> unnecessary >>> >> >>>>>>>>> > indirection compared to c/java and this could reduce that >>> pain >>> >> >>>>>>>>> > to just 1 >>> >> >>>>>>>>> > level of unnecessary indirection. >>> >> >>>>>>>>> > >>> >> >>>>>>>>> > -Edward >>> >> >>>>>>>>> >>> >> >>>>>>>>> > _______________________________________________ >>> >> >>>>>>>>> > ghc-devs mailing list >>> >> >>>>>>>>> > ghc-devs at haskell.org >>> >> >>>>>>>>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> _______________________________________________ >>> >> >>>>>>>>> ghc-devs mailing list >>> >> >>>>>>>>> ghc-devs at haskell.org >>> >> >>>>>>>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >>> >> >>>>>>> >>> >> >>>>>>> >>> >> >>>>> >>> >> >>> >>> >> >> >>> >> > >>> >> > >>> >> > _______________________________________________ >>> >> > ghc-devs mailing list >>> >> > ghc-devs at haskell.org >>> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >>> >> > >>> > >>> > >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> ghc-devs mailing list >>> ghc-devs at haskell.org >>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >>> >>> >>> >>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dan.doel at gmail.com Mon Sep 7 21:23:38 2015 From: dan.doel at gmail.com (Dan Doel) Date: Mon, 7 Sep 2015 17:23:38 -0400 Subject: ArrayArrays In-Reply-To: References: <4DACFC45-0E7E-4B3F-8435-5365EC3F7749@cse.unsw.edu.au> <65158505c7be41afad85374d246b7350@DB4PR30MB030.064d.mgd.msft.net> <2FCB6298-A4FF-4F7B-8BF8-4880BB3154AB@gmail.com> <325b043066bb48a79f254b75ba9753ee@DB4PR30MB030.064d.mgd.msft.net> Message-ID: On Mon, Sep 7, 2015 at 4:16 PM, Edward Kmett wrote: > Notably, it'd be interesting to explore the ability to allow parametricity > over the portion of # that is just a gcptr. Which is also a necessary part of Ed Yang's unlifted types proposal. This portion of # becomes the `Unlifted` kind, and it should be possible to have parametric polymorphism for it (and if that isn't stated outright, several things in the proposal assume you have it). -- Dan From ezyang at mit.edu Mon Sep 7 21:35:58 2015 From: ezyang at mit.edu (Edward Z. Yang) Date: Mon, 07 Sep 2015 14:35:58 -0700 Subject: Unlifted data types In-Reply-To: <6707b31c94d44af89ba2a90580ac46ce@DB4PR30MB030.064d.mgd.msft.net> References: <1441353701-sup-9422@sabre> <6707b31c94d44af89ba2a90580ac46ce@DB4PR30MB030.064d.mgd.msft.net> Message-ID: <1441661177-sup-2150@sabre> Hello Simon, > There are several distinct things being mixed up. I've split the document into three (four?) distinct subproposals. Proposals 1 and 2 stand alone. > (1) First, a proposal to allow a data type to be declared to be unlifted. On its own, this is a pretty simple proposal: [snip] > > I would really like to see this articulated as a stand-alone proposal. It makes sense by itself, and is really pretty simple. This is now "Proposal 1". > (2) Second, we cannot expect levity polymorphism. Consider > map f (x:xs) = f x : map f xs > Is the (f x) a thunk or is it evaluated strictly? Unless you are going to clone the code for map (which levity polymorphism is there to avoid), we can't answer "it depends on the type of (f x)". So, no, I think levity polymorphism is out. > > So I vote against splitting # into two: plain will do just fine. Levity polymorphism will not work without generating two copies of 'map', but plain polymorphism over 'Unlifted' is useful (as Dan has also pointed out.) In any case, I've extracted this out into a separate subproposal "Proposal 1.1". https://ghc.haskell.org/trac/ghc/wiki/UnliftedDataTypes#Proposal1.1:PolymorphismoveranewUnliftedkind (reordering here.) > (4) Fourth, you don't mention a related suggestion, namely to allow > newtype T = MkT Int# > with T getting kind #. I see no difficulty here. We do have (T ~R Int#). It's just a useful way of wrapping a newtype around an unlifted type. This is now "Proposal 2". > (3) Third, the stuff about Force and suspend. Provided you do no more than write library code that uses the above new features I'm fine. But there seems to be lots of stuff that dances around the hope that (Force a) is represented the same way as 'a'. I don't' know how to make this fly. Is there a coercion in FC? If so then (a ~R Force a). And that seems very doubtful since we must do some evaluation. I agree that we can't introduce a coercion between 'Force a' and 'a', for the reason you mentioned. (there's also a second reason which is that 'a ~R Force a' is not well-typed; 'a' and 'Force a' have different kinds.) I've imagined that we might be able to just continue representing Force explicitly in Core, and somehow "compile it away" at STG time, but I am definitely fuzzy about how this is supposed to work. Perhaps Force should not actually be a data type, and we should have 'force# :: a -> Force a' and 'unforce# :: Force a -> a' (the latter of which compiles to a no-op.) Cheers, Edward From simonpj at microsoft.com Mon Sep 7 21:37:37 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Mon, 7 Sep 2015 21:37:37 +0000 Subject: Unlifted data types In-Reply-To: References: <1441353701-sup-9422@sabre> <6707b31c94d44af89ba2a90580ac46ce@DB4PR30MB030.064d.mgd.msft.net> Message-ID: <6e2bcecf1a284c62a656e80992e9862e@DB4PR30MB030.064d.mgd.msft.net> | Splitting # into two kinds is useful even if functions can't be levity | polymorphic. # contains a bunch of types that aren't represented | uniformly. Int# might be 32 bits while Double# is 64, etc. But | Unlifted would contain only types that are uniformly represented as | pointers, so you could write functions that are polymorphic over types | of kind Unlifted. Yes, I agree that's true, provided they are *not* also polymorphic over things of kind *. But it's an orthogonal proposal. What you say is already true of Array# and IORef#. Perhaps there are functions that are usefully polymorphic over boxed-but-unlifted things. But our users have not been crying out for this polymorphism despite the existence of a menagerie of existing such types, including Array# and IORef# Let's tackle things one at a time, with separate proposals and separate motivation. Simon | C++ style polymorphism-as-code-generation). | | ---- | | Also, with regard to the previous mail, it's not true that `suspend` | has to be a special form. All expressions with types of kind * are | 'special forms' in the necessary sense. | | -- Dan From simonpj at microsoft.com Mon Sep 7 21:52:27 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Mon, 7 Sep 2015 21:52:27 +0000 Subject: Unlifted data types In-Reply-To: <1441661177-sup-2150@sabre> References: <1441353701-sup-9422@sabre> <6707b31c94d44af89ba2a90580ac46ce@DB4PR30MB030.064d.mgd.msft.net> <1441661177-sup-2150@sabre> Message-ID: <0f5878d44e584b6dae8fb7de6fdf1ca8@DB4PR30MB030.064d.mgd.msft.net> | I've split the document into three (four?) distinct subproposals. | Proposals 1 and 2 stand alone. I've re-numbered them 1,2,3,4, since 1.1 is (to me) a pretty major deal, stands in its own right, and certainly isn't a sub-proposal of (1). Under (new) 2, I'm very dubious about "Boxed levity polymorphism in types (and functions with extra code generation)". It's certainly true that we could generate two copied of the code for every function; but by generating three, or perhaps four copies we could also deal with Int# and Float#. Maybe one more for Double#. .NET does this on the fly, incidentally. Where do you stop? Also remember it's not just an issue of GC pointers. The semantics of the function changes, because things that are thunks for the lifted version become strict in the unlifted version. Your 'umap' is a bit more convincing. But for now (2) would be low on my priority list, until we encounter user pressure which (note) we have not encountered so far despite the range of boxed but unlifted types. Why is now the right time? (1) and (3) seem solid. I'll leave (4) for another message. Simon | -----Original Message----- | From: Edward Z. Yang [mailto:ezyang at mit.edu] | Sent: 07 September 2015 22:36 | To: Simon Peyton Jones | Cc: ghc-devs | Subject: RE: Unlifted data types | | Hello Simon, | | > There are several distinct things being mixed up. | | I've split the document into three (four?) distinct subproposals. | Proposals 1 and 2 stand alone. | | > (1) First, a proposal to allow a data type to be declared to be | unlifted. On its own, this is a pretty simple proposal: [snip] | > | > I would really like to see this articulated as a stand-alone proposal. | It makes sense by itself, and is really pretty simple. | | This is now "Proposal 1". | | > (2) Second, we cannot expect levity polymorphism. Consider | > map f (x:xs) = f x : map f xs | > Is the (f x) a thunk or is it evaluated strictly? Unless you are going | to clone the code for map (which levity polymorphism is there to avoid), | we can't answer "it depends on the type of (f x)". So, no, I think | levity polymorphism is out. | > | > So I vote against splitting # into two: plain will do just fine. | | Levity polymorphism will not work without generating two copies of | 'map', but plain polymorphism over 'Unlifted' is useful (as Dan has | also pointed out.) In any case, I've extracted this out into a | separate subproposal "Proposal 1.1". | https://ghc.haskell.org/trac/ghc/wiki/UnliftedDataTypes#Proposal1.1:Polym | orphismoveranewUnliftedkind | | (reordering here.) | | > (4) Fourth, you don't mention a related suggestion, namely to allow | > newtype T = MkT Int# | > with T getting kind #. I see no difficulty here. We do have (T ~R | Int#). It's just a useful way of wrapping a newtype around an unlifted | type. | | This is now "Proposal 2". | | > (3) Third, the stuff about Force and suspend. Provided you do no more | than write library code that uses the above new features I'm fine. But | there seems to be lots of stuff that dances around the hope that (Force | a) is represented the same way as 'a'. I don't' know how to make this | fly. Is there a coercion in FC? If so then (a ~R Force a). And that | seems very doubtful since we must do some evaluation. | | I agree that we can't introduce a coercion between 'Force a' and | 'a', for the reason you mentioned. (there's also a second reason which | is that 'a ~R Force a' is not well-typed; 'a' and 'Force a' have | different kinds.) | | I've imagined that we might be able to just continue representing | Force explicitly in Core, and somehow "compile it away" at STG time, | but I am definitely fuzzy about how this is supposed to work. Perhaps | Force should not actually be a data type, and we should have | 'force# :: a -> Force a' and 'unforce# :: Force a -> a' (the latter | of which compiles to a no-op.) | | Cheers, | Edward From simonpj at microsoft.com Mon Sep 7 21:55:09 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Mon, 7 Sep 2015 21:55:09 +0000 Subject: Unlifted data types In-Reply-To: <1441661177-sup-2150@sabre> References: <1441353701-sup-9422@sabre> <6707b31c94d44af89ba2a90580ac46ce@DB4PR30MB030.064d.mgd.msft.net> <1441661177-sup-2150@sabre> Message-ID: <9cafcebc6d274b2385f202a4fd224174@DB4PR30MB030.064d.mgd.msft.net> | I agree that we can't introduce a coercion between 'Force a' and | 'a', for the reason you mentioned. (there's also a second reason which | is that 'a ~R Force a' is not well-typed; 'a' and 'Force a' have | different kinds.) | | I've imagined that we might be able to just continue representing | Force explicitly in Core, and somehow "compile it away" at STG time, | but I am definitely fuzzy about how this is supposed to work. Perhaps | Force should not actually be a data type, and we should have | 'force# :: a -> Force a' and 'unforce# :: Force a -> a' (the latter | of which compiles to a no-op.) I'm still doubtful. What is the problem you are trying to solve here? How does Force help us? Note that a singleton unboxed tuple (# e #) has the effect of suspending; e.g. f x = (# x+1 #) return immediately, returning a pointer to a thunk for (x+1). I'm not sure if that is relevant. Simon From simonpj at microsoft.com Mon Sep 7 21:56:56 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Mon, 7 Sep 2015 21:56:56 +0000 Subject: ArrayArrays In-Reply-To: References: <4DACFC45-0E7E-4B3F-8435-5365EC3F7749@cse.unsw.edu.au> <65158505c7be41afad85374d246b7350@DB4PR30MB030.064d.mgd.msft.net> <2FCB6298-A4FF-4F7B-8BF8-4880BB3154AB@gmail.com> <325b043066bb48a79f254b75ba9753ee@DB4PR30MB030.064d.mgd.msft.net> Message-ID: This could make the menagerie of ways to pack {Small}{Mutable}Array{Array}# references into a {Small}{Mutable}Array{Array}#' actually typecheck soundly, reducing the need for folks to descend into the use of the more evil structure primitives we're talking about, and letting us keep a few more principles around us. I?m lost. Can you give some concrete examples that illustrate how levity polymorphism will help us? Simon From: Edward Kmett [mailto:ekmett at gmail.com] Sent: 07 September 2015 21:17 To: Simon Peyton Jones Cc: Ryan Newton; Johan Tibell; Simon Marlow; Manuel M T Chakravarty; Chao-Hong Chen; ghc-devs; Ryan Scott; Ryan Yates Subject: Re: ArrayArrays I had a brief discussion with Richard during the Haskell Symposium about how we might be able to let parametricity help a bit in reducing the space of necessarily primops to a slightly more manageable level. Notably, it'd be interesting to explore the ability to allow parametricity over the portion of # that is just a gcptr. We could do this if the levity polymorphism machinery was tweaked a bit. You could envision the ability to abstract over things in both * and the subset of # that are represented by a gcptr, then modifying the existing array primitives to be parametric in that choice of levity for their argument so long as it was of a "heap object" levity. This could make the menagerie of ways to pack {Small}{Mutable}Array{Array}# references into a {Small}{Mutable}Array{Array}#' actually typecheck soundly, reducing the need for folks to descend into the use of the more evil structure primitives we're talking about, and letting us keep a few more principles around us. Then in the cases like `atomicModifyMutVar#` where it needs to actually be in * rather than just a gcptr, due to the constructed field selectors it introduces on the heap then we could keep the existing less polymorphic type. -Edward On Mon, Sep 7, 2015 at 9:59 AM, Simon Peyton Jones > wrote: It was fun to meet and discuss this. Did someone volunteer to write a wiki page that describes the proposed design? And, I earnestly hope, also describes the menagerie of currently available array types and primops so that users can have some chance of picking the right one?! Thanks Simon From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Ryan Newton Sent: 31 August 2015 23:11 To: Edward Kmett; Johan Tibell Cc: Simon Marlow; Manuel M T Chakravarty; Chao-Hong Chen; ghc-devs; Ryan Scott; Ryan Yates Subject: Re: ArrayArrays Dear Edward, Ryan Yates, and other interested parties -- So when should we meet up about this? May I propose the Tues afternoon break for everyone at ICFP who is interested in this topic? We can meet out in the coffee area and congregate around Edward Kmett, who is tall and should be easy to find ;-). I think Ryan is going to show us how to use his new primops for combined array + other fields in one heap object? On Sat, Aug 29, 2015 at 9:24 PM Edward Kmett > wrote: Without a custom primitive it doesn't help much there, you have to store the indirection to the mask. With a custom primitive it should cut the on heap root-to-leaf path of everything in the HAMT in half. A shorter HashMap was actually one of the motivating factors for me doing this. It is rather astoundingly difficult to beat the performance of HashMap, so I had to start cheating pretty badly. ;) -Edward On Sat, Aug 29, 2015 at 5:45 PM, Johan Tibell > wrote: I'd also be interested to chat at ICFP to see if I can use this for my HAMT implementation. On Sat, Aug 29, 2015 at 3:07 PM, Edward Kmett > wrote: Sounds good to me. Right now I'm just hacking up composable accessors for "typed slots" in a fairly lens-like fashion, and treating the set of slots I define and the 'new' function I build for the data type as its API, and build atop that. This could eventually graduate to template-haskell, but I'm not entirely satisfied with the solution I have. I currently distinguish between what I'm calling "slots" (things that point directly to another SmallMutableArrayArray# sans wrapper) and "fields" which point directly to the usual Haskell data types because unifying the two notions meant that I couldn't lift some coercions out "far enough" to make them vanish. I'll be happy to run through my current working set of issues in person and -- as things get nailed down further -- in a longer lived medium than in personal conversations. ;) -Edward On Sat, Aug 29, 2015 at 7:59 AM, Ryan Newton > wrote: I'd also love to meet up at ICFP and discuss this. I think the array primops plus a TH layer that lets (ab)use them many times without too much marginal cost sounds great. And I'd like to learn how we could be either early users of, or help with, this infrastructure. CC'ing in Ryan Scot and Omer Agacan who may also be interested in dropping in on such discussions @ICFP, and Chao-Hong Chen, a Ph.D. student who is currently working on concurrent data structures in Haskell, but will not be at ICFP. On Fri, Aug 28, 2015 at 7:47 PM, Ryan Yates > wrote: I completely agree. I would love to spend some time during ICFP and friends talking about what it could look like. My small array for STM changes for the RTS can be seen here [1]. It is on a branch somewhere between 7.8 and 7.10 and includes irrelevant STM bits and some confusing naming choices (sorry), but should cover all the details needed to implement it for a non-STM context. The biggest surprise for me was following small array too closely and having a word/byte offset miss-match [2]. [1]: https://github.com/fryguybob/ghc/compare/ghc-htm-bloom...fryguybob:ghc-htm-mut [2]: https://ghc.haskell.org/trac/ghc/ticket/10413 Ryan On Fri, Aug 28, 2015 at 10:09 PM, Edward Kmett > wrote: > I'd love to have that last 10%, but its a lot of work to get there and more > importantly I don't know quite what it should look like. > > On the other hand, I do have a pretty good idea of how the primitives above > could be banged out and tested in a long evening, well in time for 7.12. And > as noted earlier, those remain useful even if a nicer typed version with an > extra level of indirection to the sizes is built up after. > > The rest sounds like a good graduate student project for someone who has > graduate students lying around. Maybe somebody at Indiana University who has > an interest in type theory and parallelism can find us one. =) > > -Edward > > On Fri, Aug 28, 2015 at 8:48 PM, Ryan Yates > wrote: >> >> I think from my perspective, the motivation for getting the type >> checker involved is primarily bringing this to the level where users >> could be expected to build these structures. it is reasonable to >> think that there are people who want to use STM (a context with >> mutation already) to implement a straight forward data structure that >> avoids extra indirection penalty. There should be some places where >> knowing that things are field accesses rather then array indexing >> could be helpful, but I think GHC is good right now about handling >> constant offsets. In my code I don't do any bounds checking as I know >> I will only be accessing my arrays with constant indexes. I make >> wrappers for each field access and leave all the unsafe stuff in >> there. When things go wrong though, the compiler is no help. Maybe >> template Haskell that generates the appropriate wrappers is the right >> direction to go. >> There is another benefit for me when working with these as arrays in >> that it is quite simple and direct (given the hoops already jumped >> through) to play with alignment. I can ensure two pointers are never >> on the same cache-line by just spacing things out in the array. >> >> On Fri, Aug 28, 2015 at 7:33 PM, Edward Kmett > wrote: >> > They just segfault at this level. ;) >> > >> > Sent from my iPhone >> > >> > On Aug 28, 2015, at 7:25 PM, Ryan Newton > wrote: >> > >> > You presumably also save a bounds check on reads by hard-coding the >> > sizes? >> > >> > On Fri, Aug 28, 2015 at 3:39 PM, Edward Kmett > wrote: >> >> >> >> Also there are 4 different "things" here, basically depending on two >> >> independent questions: >> >> >> >> a.) if you want to shove the sizes into the info table, and >> >> b.) if you want cardmarking. >> >> >> >> Versions with/without cardmarking for different sizes can be done >> >> pretty >> >> easily, but as noted, the infotable variants are pretty invasive. >> >> >> >> -Edward >> >> >> >> On Fri, Aug 28, 2015 at 6:36 PM, Edward Kmett > wrote: >> >>> >> >>> Well, on the plus side you'd save 16 bytes per object, which adds up >> >>> if >> >>> they were small enough and there are enough of them. You get a bit >> >>> better >> >>> locality of reference in terms of what fits in the first cache line of >> >>> them. >> >>> >> >>> -Edward >> >>> >> >>> On Fri, Aug 28, 2015 at 6:14 PM, Ryan Newton > >> >>> wrote: >> >>>> >> >>>> Yes. And for the short term I can imagine places we will settle with >> >>>> arrays even if it means tracking lengths unnecessarily and >> >>>> unsafeCoercing >> >>>> pointers whose types don't actually match their siblings. >> >>>> >> >>>> Is there anything to recommend the hacks mentioned for fixed sized >> >>>> array >> >>>> objects *other* than using them to fake structs? (Much to >> >>>> derecommend, as >> >>>> you mentioned!) >> >>>> >> >>>> On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett > >> >>>> wrote: >> >>>>> >> >>>>> I think both are useful, but the one you suggest requires a lot more >> >>>>> plumbing and doesn't subsume all of the usecases of the other. >> >>>>> >> >>>>> -Edward >> >>>>> >> >>>>> On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton > >> >>>>> wrote: >> >>>>>> >> >>>>>> So that primitive is an array like thing (Same pointed type, >> >>>>>> unbounded >> >>>>>> length) with extra payload. >> >>>>>> >> >>>>>> I can see how we can do without structs if we have arrays, >> >>>>>> especially >> >>>>>> with the extra payload at front. But wouldn't the general solution >> >>>>>> for >> >>>>>> structs be one that that allows new user data type defs for # >> >>>>>> types? >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett > >> >>>>>> wrote: >> >>>>>>> >> >>>>>>> Some form of MutableStruct# with a known number of words and a >> >>>>>>> known >> >>>>>>> number of pointers is basically what Ryan Yates was suggesting >> >>>>>>> above, but >> >>>>>>> where the word counts were stored in the objects themselves. >> >>>>>>> >> >>>>>>> Given that it'd have a couple of words for those counts it'd >> >>>>>>> likely >> >>>>>>> want to be something we build in addition to MutVar# rather than a >> >>>>>>> replacement. >> >>>>>>> >> >>>>>>> On the other hand, if we had to fix those numbers and build info >> >>>>>>> tables that knew them, and typechecker support, for instance, it'd >> >>>>>>> get >> >>>>>>> rather invasive. >> >>>>>>> >> >>>>>>> Also, a number of things that we can do with the 'sized' versions >> >>>>>>> above, like working with evil unsized c-style arrays directly >> >>>>>>> inline at the >> >>>>>>> end of the structure cease to be possible, so it isn't even a pure >> >>>>>>> win if we >> >>>>>>> did the engineering effort. >> >>>>>>> >> >>>>>>> I think 90% of the needs I have are covered just by adding the one >> >>>>>>> primitive. The last 10% gets pretty invasive. >> >>>>>>> >> >>>>>>> -Edward >> >>>>>>> >> >>>>>>> On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton > >> >>>>>>> wrote: >> >>>>>>>> >> >>>>>>>> I like the possibility of a general solution for mutable structs >> >>>>>>>> (like Ed said), and I'm trying to fully understand why it's hard. >> >>>>>>>> >> >>>>>>>> So, we can't unpack MutVar into constructors because of object >> >>>>>>>> identity problems. But what about directly supporting an >> >>>>>>>> extensible set of >> >>>>>>>> unlifted MutStruct# objects, generalizing (and even replacing) >> >>>>>>>> MutVar#? That >> >>>>>>>> may be too much work, but is it problematic otherwise? >> >>>>>>>> >> >>>>>>>> Needless to say, this is also critical if we ever want best in >> >>>>>>>> class >> >>>>>>>> lockfree mutable structures, just like their Stm and sequential >> >>>>>>>> counterparts. >> >>>>>>>> >> >>>>>>>> On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones >> >>>>>>>> > wrote: >> >>>>>>>>> >> >>>>>>>>> At the very least I'll take this email and turn it into a short >> >>>>>>>>> article. >> >>>>>>>>> >> >>>>>>>>> Yes, please do make it into a wiki page on the GHC Trac, and >> >>>>>>>>> maybe >> >>>>>>>>> make a ticket for it. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Thanks >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Simon >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> From: Edward Kmett [mailto:ekmett at gmail.com] >> >>>>>>>>> Sent: 27 August 2015 16:54 >> >>>>>>>>> To: Simon Peyton Jones >> >>>>>>>>> Cc: Manuel M T Chakravarty; Simon Marlow; ghc-devs >> >>>>>>>>> Subject: Re: ArrayArrays >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> An ArrayArray# is just an Array# with a modified invariant. It >> >>>>>>>>> points directly to other unlifted ArrayArray#'s or ByteArray#'s. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> While those live in #, they are garbage collected objects, so >> >>>>>>>>> this >> >>>>>>>>> all lives on the heap. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> They were added to make some of the DPH stuff fast when it has >> >>>>>>>>> to >> >>>>>>>>> deal with nested arrays. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I'm currently abusing them as a placeholder for a better thing. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> The Problem >> >>>>>>>>> >> >>>>>>>>> ----------------- >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Consider the scenario where you write a classic doubly-linked >> >>>>>>>>> list >> >>>>>>>>> in Haskell. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Chasing from one DLL to the next requires following 3 pointers >> >>>>>>>>> on >> >>>>>>>>> the heap. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> DLL ~> IORef (Maybe DLL) ~> MutVar# RealWorld (Maybe DLL) ~> >> >>>>>>>>> Maybe >> >>>>>>>>> DLL ~> DLL >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> That is 3 levels of indirection. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> We can trim one by simply unpacking the IORef with >> >>>>>>>>> -funbox-strict-fields or UNPACK >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> We can trim another by adding a 'Nil' constructor for DLL and >> >>>>>>>>> worsening our representation. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> but now we're still stuck with a level of indirection >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> DLL ~> MutVar# RealWorld DLL ~> DLL >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> This means that every operation we perform on this structure >> >>>>>>>>> will >> >>>>>>>>> be about half of the speed of an implementation in most other >> >>>>>>>>> languages >> >>>>>>>>> assuming we're memory bound on loading things into cache! >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Making Progress >> >>>>>>>>> >> >>>>>>>>> ---------------------- >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I have been working on a number of data structures where the >> >>>>>>>>> indirection of going from something in * out to an object in # >> >>>>>>>>> which >> >>>>>>>>> contains the real pointer to my target and coming back >> >>>>>>>>> effectively doubles >> >>>>>>>>> my runtime. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> We go out to the MutVar# because we are allowed to put the >> >>>>>>>>> MutVar# >> >>>>>>>>> onto the mutable list when we dirty it. There is a well defined >> >>>>>>>>> write-barrier. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I could change out the representation to use >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data DLL = DLL (MutableArray# RealWorld DLL) | Nil >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I can just store two pointers in the MutableArray# every time, >> >>>>>>>>> but >> >>>>>>>>> this doesn't help _much_ directly. It has reduced the amount of >> >>>>>>>>> distinct >> >>>>>>>>> addresses in memory I touch on a walk of the DLL from 3 per >> >>>>>>>>> object to 2. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I still have to go out to the heap from my DLL and get to the >> >>>>>>>>> array >> >>>>>>>>> object and then chase it to the next DLL and chase that to the >> >>>>>>>>> next array. I >> >>>>>>>>> do get my two pointers together in memory though. I'm paying for >> >>>>>>>>> a card >> >>>>>>>>> marking table as well, which I don't particularly need with just >> >>>>>>>>> two >> >>>>>>>>> pointers, but we can shed that with the "SmallMutableArray#" >> >>>>>>>>> machinery added >> >>>>>>>>> back in 7.10, which is just the old array code a a new data >> >>>>>>>>> type, which can >> >>>>>>>>> speed things up a bit when you don't have very big arrays: >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> But what if I wanted my object itself to live in # and have two >> >>>>>>>>> mutable fields and be able to share the sme write barrier? >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> An ArrayArray# points directly to other unlifted array types. >> >>>>>>>>> What >> >>>>>>>>> if we have one # -> * wrapper on the outside to deal with the >> >>>>>>>>> impedence >> >>>>>>>>> mismatch between the imperative world and Haskell, and then just >> >>>>>>>>> let the >> >>>>>>>>> ArrayArray#'s hold other arrayarrays. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data DLL = DLL (MutableArrayArray# RealWorld) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> now I need to make up a new Nil, which I can just make be a >> >>>>>>>>> special >> >>>>>>>>> MutableArrayArray# I allocate on program startup. I can even >> >>>>>>>>> abuse pattern >> >>>>>>>>> synonyms. Alternately I can exploit the internals further to >> >>>>>>>>> make this >> >>>>>>>>> cheaper. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Then I can use the readMutableArrayArray# and >> >>>>>>>>> writeMutableArrayArray# calls to directly access the preceding >> >>>>>>>>> and next >> >>>>>>>>> entry in the linked list. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> So now we have one DLL wrapper which just 'bootstraps me' into a >> >>>>>>>>> strict world, and everything there lives in #. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> next :: DLL -> IO DLL >> >>>>>>>>> >> >>>>>>>>> next (DLL m) = IO $ \s -> case readMutableArrayArray# s of >> >>>>>>>>> >> >>>>>>>>> (# s', n #) -> (# s', DLL n #) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> It turns out GHC is quite happy to optimize all of that code to >> >>>>>>>>> keep things unboxed. The 'DLL' wrappers get removed pretty >> >>>>>>>>> easily when they >> >>>>>>>>> are known strict and you chain operations of this sort! >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Cleaning it Up >> >>>>>>>>> >> >>>>>>>>> ------------------ >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Now I have one outermost indirection pointing to an array that >> >>>>>>>>> points directly to other arrays. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I'm stuck paying for a card marking table per object, but I can >> >>>>>>>>> fix >> >>>>>>>>> that by duplicating the code for MutableArrayArray# and using a >> >>>>>>>>> SmallMutableArray#. I can hack up primops that let me store a >> >>>>>>>>> mixture of >> >>>>>>>>> SmallMutableArray# fields and normal ones in the data structure. >> >>>>>>>>> Operationally, I can even do so by just unsafeCoercing the >> >>>>>>>>> existing >> >>>>>>>>> SmallMutableArray# primitives to change the kind of one of the >> >>>>>>>>> arguments it >> >>>>>>>>> takes. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> This is almost ideal, but not quite. I often have fields that >> >>>>>>>>> would >> >>>>>>>>> be best left unboxed. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data DLLInt = DLL !Int !(IORef DLL) !(IORef DLL) | Nil >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> was able to unpack the Int, but we lost that. We can currently >> >>>>>>>>> at >> >>>>>>>>> best point one of the entries of the SmallMutableArray# at a >> >>>>>>>>> boxed or at a >> >>>>>>>>> MutableByteArray# for all of our misc. data and shove the int in >> >>>>>>>>> question in >> >>>>>>>>> there. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> e.g. if I were to implement a hash-array-mapped-trie I need to >> >>>>>>>>> store masks and administrivia as I walk down the tree. Having to >> >>>>>>>>> go off to >> >>>>>>>>> the side costs me the entire win from avoiding the first pointer >> >>>>>>>>> chase. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> But, if like Ryan suggested, we had a heap object we could >> >>>>>>>>> construct that had n words with unsafe access and m pointers to >> >>>>>>>>> other heap >> >>>>>>>>> objects, one that could put itself on the mutable list when any >> >>>>>>>>> of those >> >>>>>>>>> pointers changed then I could shed this last factor of two in >> >>>>>>>>> all >> >>>>>>>>> circumstances. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Prototype >> >>>>>>>>> >> >>>>>>>>> ------------- >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Over the last few days I've put together a small prototype >> >>>>>>>>> implementation with a few non-trivial imperative data structures >> >>>>>>>>> for things >> >>>>>>>>> like Tarjan's link-cut trees, the list labeling problem and >> >>>>>>>>> order-maintenance. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> https://github.com/ekmett/structs >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Notable bits: >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Data.Struct.Internal.LinkCut provides an implementation of >> >>>>>>>>> link-cut >> >>>>>>>>> trees in this style. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Data.Struct.Internal provides the rather horrifying guts that >> >>>>>>>>> make >> >>>>>>>>> it go fast. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Once compiled with -O or -O2, if you look at the core, almost >> >>>>>>>>> all >> >>>>>>>>> the references to the LinkCut or Object data constructor get >> >>>>>>>>> optimized away, >> >>>>>>>>> and we're left with beautiful strict code directly mutating out >> >>>>>>>>> underlying >> >>>>>>>>> representation. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> At the very least I'll take this email and turn it into a short >> >>>>>>>>> article. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> -Edward >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On Thu, Aug 27, 2015 at 9:00 AM, Simon Peyton Jones >> >>>>>>>>> > wrote: >> >>>>>>>>> >> >>>>>>>>> Just to say that I have no idea what is going on in this thread. >> >>>>>>>>> What is ArrayArray? What is the issue in general? Is there a >> >>>>>>>>> ticket? Is >> >>>>>>>>> there a wiki page? >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> If it?s important, an ab-initio wiki page + ticket would be a >> >>>>>>>>> good >> >>>>>>>>> thing. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Simon >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf >> >>>>>>>>> Of >> >>>>>>>>> Edward Kmett >> >>>>>>>>> Sent: 21 August 2015 05:25 >> >>>>>>>>> To: Manuel M T Chakravarty >> >>>>>>>>> Cc: Simon Marlow; ghc-devs >> >>>>>>>>> Subject: Re: ArrayArrays >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> When (ab)using them for this purpose, SmallArrayArray's would be >> >>>>>>>>> very handy as well. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Consider right now if I have something like an order-maintenance >> >>>>>>>>> structure I have: >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data Upper s = Upper {-# UNPACK #-} !(MutableByteArray s) {-# >> >>>>>>>>> UNPACK #-} !(MutVar s (Upper s)) {-# UNPACK #-} !(MutVar s >> >>>>>>>>> (Upper s)) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data Lower s = Lower {-# UNPACK #-} !(MutVar s (Upper s)) {-# >> >>>>>>>>> UNPACK #-} !(MutableByteArray s) {-# UNPACK #-} !(MutVar s >> >>>>>>>>> (Lower s)) {-# >> >>>>>>>>> UNPACK #-} !(MutVar s (Lower s)) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> The former contains, logically, a mutable integer and two >> >>>>>>>>> pointers, >> >>>>>>>>> one for forward and one for backwards. The latter is basically >> >>>>>>>>> the same >> >>>>>>>>> thing with a mutable reference up pointing at the structure >> >>>>>>>>> above. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On the heap this is an object that points to a structure for the >> >>>>>>>>> bytearray, and points to another structure for each mutvar which >> >>>>>>>>> each point >> >>>>>>>>> to the other 'Upper' structure. So there is a level of >> >>>>>>>>> indirection smeared >> >>>>>>>>> over everything. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> So this is a pair of doubly linked lists with an upward link >> >>>>>>>>> from >> >>>>>>>>> the structure below to the structure above. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Converted into ArrayArray#s I'd get >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data Upper s = Upper (MutableArrayArray# s) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> w/ the first slot being a pointer to a MutableByteArray#, and >> >>>>>>>>> the >> >>>>>>>>> next 2 slots pointing to the previous and next previous objects, >> >>>>>>>>> represented >> >>>>>>>>> just as their MutableArrayArray#s. I can use >> >>>>>>>>> sameMutableArrayArray# on these >> >>>>>>>>> for object identity, which lets me check for the ends of the >> >>>>>>>>> lists by tying >> >>>>>>>>> things back on themselves. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> and below that >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data Lower s = Lower (MutableArrayArray# s) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> is similar, with an extra MutableArrayArray slot pointing up to >> >>>>>>>>> an >> >>>>>>>>> upper structure. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I can then write a handful of combinators for getting out the >> >>>>>>>>> slots >> >>>>>>>>> in question, while it has gained a level of indirection between >> >>>>>>>>> the wrapper >> >>>>>>>>> to put it in * and the MutableArrayArray# s in #, that one can >> >>>>>>>>> be basically >> >>>>>>>>> erased by ghc. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Unlike before I don't have several separate objects on the heap >> >>>>>>>>> for >> >>>>>>>>> each thing. I only have 2 now. The MutableArrayArray# for the >> >>>>>>>>> object itself, >> >>>>>>>>> and the MutableByteArray# that it references to carry around the >> >>>>>>>>> mutable >> >>>>>>>>> int. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> The only pain points are >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> 1.) the aforementioned limitation that currently prevents me >> >>>>>>>>> from >> >>>>>>>>> stuffing normal boxed data through a SmallArray or Array into an >> >>>>>>>>> ArrayArray >> >>>>>>>>> leaving me in a little ghetto disconnected from the rest of >> >>>>>>>>> Haskell, >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> and >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> 2.) the lack of SmallArrayArray's, which could let us avoid the >> >>>>>>>>> card marking overhead. These objects are all small, 3-4 pointers >> >>>>>>>>> wide. Card >> >>>>>>>>> marking doesn't help. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Alternately I could just try to do really evil things and >> >>>>>>>>> convert >> >>>>>>>>> the whole mess to SmallArrays and then figure out how to >> >>>>>>>>> unsafeCoerce my way >> >>>>>>>>> to glory, stuffing the #'d references to the other arrays >> >>>>>>>>> directly into the >> >>>>>>>>> SmallArray as slots, removing the limitation we see here by >> >>>>>>>>> aping the >> >>>>>>>>> MutableArrayArray# s API, but that gets really really dangerous! >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I'm pretty much willing to sacrifice almost anything on the >> >>>>>>>>> altar >> >>>>>>>>> of speed here, but I'd like to be able to let the GC move them >> >>>>>>>>> and collect >> >>>>>>>>> them which rules out simpler Ptr and Addr based solutions. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> -Edward >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On Thu, Aug 20, 2015 at 9:01 PM, Manuel M T Chakravarty >> >>>>>>>>> > wrote: >> >>>>>>>>> >> >>>>>>>>> That?s an interesting idea. >> >>>>>>>>> >> >>>>>>>>> Manuel >> >>>>>>>>> >> >>>>>>>>> > Edward Kmett >: >> >>>>>>>>> >> >>>>>>>>> > >> >>>>>>>>> > Would it be possible to add unsafe primops to add Array# and >> >>>>>>>>> > SmallArray# entries to an ArrayArray#? The fact that the >> >>>>>>>>> > ArrayArray# entries >> >>>>>>>>> > are all directly unlifted avoiding a level of indirection for >> >>>>>>>>> > the containing >> >>>>>>>>> > structure is amazing, but I can only currently use it if my >> >>>>>>>>> > leaf level data >> >>>>>>>>> > can be 100% unboxed and distributed among ByteArray#s. It'd be >> >>>>>>>>> > nice to be >> >>>>>>>>> > able to have the ability to put SmallArray# a stuff down at >> >>>>>>>>> > the leaves to >> >>>>>>>>> > hold lifted contents. >> >>>>>>>>> > >> >>>>>>>>> > I accept fully that if I name the wrong type when I go to >> >>>>>>>>> > access >> >>>>>>>>> > one of the fields it'll lie to me, but I suppose it'd do that >> >>>>>>>>> > if i tried to >> >>>>>>>>> > use one of the members that held a nested ArrayArray# as a >> >>>>>>>>> > ByteArray# >> >>>>>>>>> > anyways, so it isn't like there is a safety story preventing >> >>>>>>>>> > this. >> >>>>>>>>> > >> >>>>>>>>> > I've been hunting for ways to try to kill the indirection >> >>>>>>>>> > problems I get with Haskell and mutable structures, and I >> >>>>>>>>> > could shoehorn a >> >>>>>>>>> > number of them into ArrayArrays if this worked. >> >>>>>>>>> > >> >>>>>>>>> > Right now I'm stuck paying for 2 or 3 levels of unnecessary >> >>>>>>>>> > indirection compared to c/java and this could reduce that pain >> >>>>>>>>> > to just 1 >> >>>>>>>>> > level of unnecessary indirection. >> >>>>>>>>> > >> >>>>>>>>> > -Edward >> >>>>>>>>> >> >>>>>>>>> > _______________________________________________ >> >>>>>>>>> > ghc-devs mailing list >> >>>>>>>>> > ghc-devs at haskell.org >> >>>>>>>>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> _______________________________________________ >> >>>>>>>>> ghc-devs mailing list >> >>>>>>>>> ghc-devs at haskell.org >> >>>>>>>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> >>>>>>> >> >>>>>>> >> >>>>> >> >>> >> >> >> > >> > >> > _______________________________________________ >> > ghc-devs mailing list >> > ghc-devs at haskell.org >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> > > > _______________________________________________ ghc-devs mailing list ghc-devs at haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs -------------- next part -------------- An HTML attachment was scrubbed... URL: From ezyang at mit.edu Mon Sep 7 22:08:48 2015 From: ezyang at mit.edu (Edward Z. Yang) Date: Mon, 07 Sep 2015 15:08:48 -0700 Subject: Unlifted data types In-Reply-To: <9cafcebc6d274b2385f202a4fd224174@DB4PR30MB030.064d.mgd.msft.net> References: <1441353701-sup-9422@sabre> <6707b31c94d44af89ba2a90580ac46ce@DB4PR30MB030.064d.mgd.msft.net> <1441661177-sup-2150@sabre> <9cafcebc6d274b2385f202a4fd224174@DB4PR30MB030.064d.mgd.msft.net> Message-ID: <1441663307-sup-612@sabre> Excerpts from Simon Peyton Jones's message of 2015-09-07 14:55:09 -0700: > I'm still doubtful. What is the problem you are trying to solve here? How does Force help us? The problem 'Force' is trying to solve is the fact that Haskell currently has many existing lifted data types, and they all have ~essentially identical unlifted versions. But for a user to write the lifted and unlifted version, they have to copy paste their code or use 'Force'. > Note that a singleton unboxed tuple (# e #) has the effect of suspending; e.g. > f x = (# x+1 #) > return immediately, returning a pointer to a thunk for (x+1). I'm not sure if that is relevant. I don't think so? Unboxed tuples take a computation with kind * and represent it in kind #. But 'suspend' takes a computation in kind # and represents in kind *. Edward From dan.doel at gmail.com Mon Sep 7 23:09:18 2015 From: dan.doel at gmail.com (Dan Doel) Date: Mon, 7 Sep 2015 19:09:18 -0400 Subject: Unlifted data types In-Reply-To: <6e2bcecf1a284c62a656e80992e9862e@DB4PR30MB030.064d.mgd.msft.net> References: <1441353701-sup-9422@sabre> <6707b31c94d44af89ba2a90580ac46ce@DB4PR30MB030.064d.mgd.msft.net> <6e2bcecf1a284c62a656e80992e9862e@DB4PR30MB030.064d.mgd.msft.net> Message-ID: On Mon, Sep 7, 2015 at 5:37 PM, Simon Peyton Jones wrote: > But it's an orthogonal proposal. What you say is already true of Array# and IORef#. Perhaps there are functions that are usefully polymorphic over boxed-but-unlifted things. But our users have not been crying out for this polymorphism despite the existence of a menagerie of existing such types, including Array# and IORef# Well, evidently people over in the ArrayArray thread want it, for one. But also, if general unlifted types get accepted, there are many possible uses. For instance, people working with concurrency have to worry about work being done in the correct thread, and there are functions on MVars and whatnot that ensure that threads don't simply pass thunks between each other. But, if you have unlifted types, then you can have: data UMVar (a :: Unlifted) and then the type rules out the possibility of passing thunks through a reference (at least at the top level). But this requires polymorphism to avoid having to create a separate type for each unlifted type. This is also a use case of `Force`, since it is likely that we want to put ordinary data types in the MVars, just ensure that we aren't passing thunks with delayed work. ---- I'm kind of down on being polymorphic over choice of evaluation order, as well. At least without any further motivation. ---- Also, I'd still like to synthesize some of the redundancies introduced by the proposal. Perhaps it could be done by making `Force` a more primitive building block than !. I.E. data Nat = Zero | Suc !Nat can be, under this proposal, considered sugar for something like: data Nat = Zero | Suc_INTERNAL# (Force Nat) pattern Suc x = Suc_INTERNAL# (Force x) and all stipulations about what you can UNPACK are actually about Unlifted fields, rather than ! fields (which just inherit them from Force). That still leaves `Force LiftedSum` vs. `UnliftedSum`, though. And to be honest, I'm not sure we need arbitrary data types in Unlifted; Force (which would be primitive) might be enough. -- Dan From ekmett at gmail.com Mon Sep 7 23:14:35 2015 From: ekmett at gmail.com (Edward Kmett) Date: Mon, 7 Sep 2015 19:14:35 -0400 Subject: ArrayArrays In-Reply-To: References: <4DACFC45-0E7E-4B3F-8435-5365EC3F7749@cse.unsw.edu.au> <65158505c7be41afad85374d246b7350@DB4PR30MB030.064d.mgd.msft.net> <2FCB6298-A4FF-4F7B-8BF8-4880BB3154AB@gmail.com> <325b043066bb48a79f254b75ba9753ee@DB4PR30MB030.064d.mgd.msft.net> Message-ID: Assume we had the ability to talk about Levity in a new way and instead of just: data Levity = Lifted | Unlifted type * = TYPE 'Lifted type # = TYPE 'Unlifted we replace had a more nuanced notion of TYPE parameterized on another data type: data Levity = Lifted | Unlifted data Param = Composite | Simple Levity and we parameterized TYPE with a Param rather than Levity. Existing strange representations can continue to live in TYPE 'Composite (# Int# , Double #) :: TYPE 'Composite and we don't support parametricity in there, just like, currently we don't allow parametricity in #. We can include the undefined example from Richard's talk: undefined :: forall (v :: Param). v and ultimately lift it into his pi type when it is available just as before. But we could let consider TYPE ('Simple 'Unlifted) as a form of 'parametric #' covering unlifted things we're willing to allow polymorphism over because they are just pointers to something in the heap, that just happens to not be able to be _|_ or a thunk. In this setting, recalling that above, I modified Richard's TYPE to take a Param instead of Levity, we can define a type alias for things that live as a simple pointer to a heap allocated object: type GC (l :: Levity) = TYPE ('Simple l) type * = GC 'Lifted and then we can look at existing primitives generalized: Array# :: forall (l :: Levity) (a :: GC l). a -> GC 'Unlifted MutableArray# :: forall (l :: Levity) (a :: GC l). * -> a -> GC 'Unlifted SmallArray# :: forall (l :: Levity) (a :: GC l). a -> GC 'Unlifted SmallMutableArray# :: forall (l :: Levity) (a :: GC l). * -> a -> GC 'Unlifted MutVar# :: forall (l :: Levity) (a :: GC l). * -> a -> GC 'Unlifted MVar# :: forall (l :: Levity) (a :: GC l). * -> a -> GC 'Unlifted Weak#, StablePtr#, StableName#, etc. all can take similar modifications. Recall that an ArrayArray# was just an Array# hacked up to be able to hold onto the subset of # that is collectable. Almost all of the operations on these data types can work on the more general kind of argument. newArray# :: forall (s :: *) (l :: Levity) (a :: GC l). Int# -> a -> State# s -> (# State# s, MutableArray# s a #) writeArray# :: forall (s :: *) (l :: Levity) (a :: GC l). MutableArray# s a -> Int# -> a -> State# s -> State# s readArray# :: forall (s :: *) (l :: Levity) (a :: GC l). MutableArray# s a -> Int# -> State# s -> (# State# s, a #) etc. Only a couple of our existing primitives _can't_ generalize this way. The one that leaps to mind is atomicModifyMutVar, which would need to stay constrained to only work on arguments in *, because of the way it operates. With that we can still talk about MutableArray# s Int but now we can also talk about: MutableArray# s (MutableArray# s Int) without the layer of indirection through a box in * and without an explosion of primops. The same newFoo, readFoo, writeFoo machinery works for both kinds. The struct machinery doesn't get to take advantage of this, but it would let us clean house elsewhere in Prim and drastically improve the range of applicability of the existing primitives with nothing more than a small change to the levity machinery. I'm not attached to any of the names above, I coined them just to give us a concrete thing to talk about. Here I'm only proposing we extend machinery in GHC.Prim this way, but an interesting 'now that the barn door is open' question is to consider that our existing Haskell data types often admit a similar form of parametricity and nothing in principle prevents this from working for Maybe or [] and once you permit inference to fire across all of GC l then it seems to me that you'd start to get those same capabilities there as well when LevityPolymorphism was turned on. -Edward On Mon, Sep 7, 2015 at 5:56 PM, Simon Peyton Jones wrote: > This could make the menagerie of ways to pack > {Small}{Mutable}Array{Array}# references into a > {Small}{Mutable}Array{Array}#' actually typecheck soundly, reducing the > need for folks to descend into the use of the more evil structure > primitives we're talking about, and letting us keep a few more principles > around us. > > > > I?m lost. Can you give some concrete examples that illustrate how levity > polymorphism will help us? > > > Simon > > > > *From:* Edward Kmett [mailto:ekmett at gmail.com] > *Sent:* 07 September 2015 21:17 > *To:* Simon Peyton Jones > *Cc:* Ryan Newton; Johan Tibell; Simon Marlow; Manuel M T Chakravarty; > Chao-Hong Chen; ghc-devs; Ryan Scott; Ryan Yates > *Subject:* Re: ArrayArrays > > > > I had a brief discussion with Richard during the Haskell Symposium about > how we might be able to let parametricity help a bit in reducing the space > of necessarily primops to a slightly more manageable level. > > > > Notably, it'd be interesting to explore the ability to allow parametricity > over the portion of # that is just a gcptr. > > > > We could do this if the levity polymorphism machinery was tweaked a bit. > You could envision the ability to abstract over things in both * and the > subset of # that are represented by a gcptr, then modifying the existing > array primitives to be parametric in that choice of levity for their > argument so long as it was of a "heap object" levity. > > > > This could make the menagerie of ways to pack > {Small}{Mutable}Array{Array}# references into a > {Small}{Mutable}Array{Array}#' actually typecheck soundly, reducing the > need for folks to descend into the use of the more evil structure > primitives we're talking about, and letting us keep a few more principles > around us. > > > > Then in the cases like `atomicModifyMutVar#` where it needs to actually be > in * rather than just a gcptr, due to the constructed field selectors it > introduces on the heap then we could keep the existing less polymorphic > type. > > > > -Edward > > > > On Mon, Sep 7, 2015 at 9:59 AM, Simon Peyton Jones > wrote: > > It was fun to meet and discuss this. > > > > Did someone volunteer to write a wiki page that describes the proposed > design? And, I earnestly hope, also describes the menagerie of currently > available array types and primops so that users can have some chance of > picking the right one?! > > > > Thanks > > > > Simon > > > > *From:* ghc-devs [mailto:ghc-devs-bounces at haskell.org] *On Behalf Of *Ryan > Newton > *Sent:* 31 August 2015 23:11 > *To:* Edward Kmett; Johan Tibell > *Cc:* Simon Marlow; Manuel M T Chakravarty; Chao-Hong Chen; ghc-devs; > Ryan Scott; Ryan Yates > *Subject:* Re: ArrayArrays > > > > Dear Edward, Ryan Yates, and other interested parties -- > > > > So when should we meet up about this? > > > > May I propose the Tues afternoon break for everyone at ICFP who is > interested in this topic? We can meet out in the coffee area and > congregate around Edward Kmett, who is tall and should be easy to find ;-). > > > > I think Ryan is going to show us how to use his new primops for combined > array + other fields in one heap object? > > > > On Sat, Aug 29, 2015 at 9:24 PM Edward Kmett wrote: > > Without a custom primitive it doesn't help much there, you have to store > the indirection to the mask. > > > > With a custom primitive it should cut the on heap root-to-leaf path of > everything in the HAMT in half. A shorter HashMap was actually one of the > motivating factors for me doing this. It is rather astoundingly difficult > to beat the performance of HashMap, so I had to start cheating pretty > badly. ;) > > > > -Edward > > > > On Sat, Aug 29, 2015 at 5:45 PM, Johan Tibell > wrote: > > I'd also be interested to chat at ICFP to see if I can use this for my > HAMT implementation. > > > > On Sat, Aug 29, 2015 at 3:07 PM, Edward Kmett wrote: > > Sounds good to me. Right now I'm just hacking up composable accessors for > "typed slots" in a fairly lens-like fashion, and treating the set of slots > I define and the 'new' function I build for the data type as its API, and > build atop that. This could eventually graduate to template-haskell, but > I'm not entirely satisfied with the solution I have. I currently > distinguish between what I'm calling "slots" (things that point directly to > another SmallMutableArrayArray# sans wrapper) and "fields" which point > directly to the usual Haskell data types because unifying the two notions > meant that I couldn't lift some coercions out "far enough" to make them > vanish. > > > > I'll be happy to run through my current working set of issues in person > and -- as things get nailed down further -- in a longer lived medium than > in personal conversations. ;) > > > > -Edward > > > > On Sat, Aug 29, 2015 at 7:59 AM, Ryan Newton wrote: > > I'd also love to meet up at ICFP and discuss this. I think the array > primops plus a TH layer that lets (ab)use them many times without too much > marginal cost sounds great. And I'd like to learn how we could be either > early users of, or help with, this infrastructure. > > > > CC'ing in Ryan Scot and Omer Agacan who may also be interested in dropping > in on such discussions @ICFP, and Chao-Hong Chen, a Ph.D. student who is > currently working on concurrent data structures in Haskell, but will not be > at ICFP. > > > > > > On Fri, Aug 28, 2015 at 7:47 PM, Ryan Yates wrote: > > I completely agree. I would love to spend some time during ICFP and > friends talking about what it could look like. My small array for STM > changes for the RTS can be seen here [1]. It is on a branch somewhere > between 7.8 and 7.10 and includes irrelevant STM bits and some > confusing naming choices (sorry), but should cover all the details > needed to implement it for a non-STM context. The biggest surprise > for me was following small array too closely and having a word/byte > offset miss-match [2]. > > [1]: > https://github.com/fryguybob/ghc/compare/ghc-htm-bloom...fryguybob:ghc-htm-mut > [2]: https://ghc.haskell.org/trac/ghc/ticket/10413 > > Ryan > > > On Fri, Aug 28, 2015 at 10:09 PM, Edward Kmett wrote: > > I'd love to have that last 10%, but its a lot of work to get there and > more > > importantly I don't know quite what it should look like. > > > > On the other hand, I do have a pretty good idea of how the primitives > above > > could be banged out and tested in a long evening, well in time for 7.12. > And > > as noted earlier, those remain useful even if a nicer typed version with > an > > extra level of indirection to the sizes is built up after. > > > > The rest sounds like a good graduate student project for someone who has > > graduate students lying around. Maybe somebody at Indiana University who > has > > an interest in type theory and parallelism can find us one. =) > > > > -Edward > > > > On Fri, Aug 28, 2015 at 8:48 PM, Ryan Yates wrote: > >> > >> I think from my perspective, the motivation for getting the type > >> checker involved is primarily bringing this to the level where users > >> could be expected to build these structures. it is reasonable to > >> think that there are people who want to use STM (a context with > >> mutation already) to implement a straight forward data structure that > >> avoids extra indirection penalty. There should be some places where > >> knowing that things are field accesses rather then array indexing > >> could be helpful, but I think GHC is good right now about handling > >> constant offsets. In my code I don't do any bounds checking as I know > >> I will only be accessing my arrays with constant indexes. I make > >> wrappers for each field access and leave all the unsafe stuff in > >> there. When things go wrong though, the compiler is no help. Maybe > >> template Haskell that generates the appropriate wrappers is the right > >> direction to go. > >> There is another benefit for me when working with these as arrays in > >> that it is quite simple and direct (given the hoops already jumped > >> through) to play with alignment. I can ensure two pointers are never > >> on the same cache-line by just spacing things out in the array. > >> > >> On Fri, Aug 28, 2015 at 7:33 PM, Edward Kmett wrote: > >> > They just segfault at this level. ;) > >> > > >> > Sent from my iPhone > >> > > >> > On Aug 28, 2015, at 7:25 PM, Ryan Newton wrote: > >> > > >> > You presumably also save a bounds check on reads by hard-coding the > >> > sizes? > >> > > >> > On Fri, Aug 28, 2015 at 3:39 PM, Edward Kmett > wrote: > >> >> > >> >> Also there are 4 different "things" here, basically depending on two > >> >> independent questions: > >> >> > >> >> a.) if you want to shove the sizes into the info table, and > >> >> b.) if you want cardmarking. > >> >> > >> >> Versions with/without cardmarking for different sizes can be done > >> >> pretty > >> >> easily, but as noted, the infotable variants are pretty invasive. > >> >> > >> >> -Edward > >> >> > >> >> On Fri, Aug 28, 2015 at 6:36 PM, Edward Kmett > wrote: > >> >>> > >> >>> Well, on the plus side you'd save 16 bytes per object, which adds up > >> >>> if > >> >>> they were small enough and there are enough of them. You get a bit > >> >>> better > >> >>> locality of reference in terms of what fits in the first cache line > of > >> >>> them. > >> >>> > >> >>> -Edward > >> >>> > >> >>> On Fri, Aug 28, 2015 at 6:14 PM, Ryan Newton > >> >>> wrote: > >> >>>> > >> >>>> Yes. And for the short term I can imagine places we will settle > with > >> >>>> arrays even if it means tracking lengths unnecessarily and > >> >>>> unsafeCoercing > >> >>>> pointers whose types don't actually match their siblings. > >> >>>> > >> >>>> Is there anything to recommend the hacks mentioned for fixed sized > >> >>>> array > >> >>>> objects *other* than using them to fake structs? (Much to > >> >>>> derecommend, as > >> >>>> you mentioned!) > >> >>>> > >> >>>> On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett > >> >>>> wrote: > >> >>>>> > >> >>>>> I think both are useful, but the one you suggest requires a lot > more > >> >>>>> plumbing and doesn't subsume all of the usecases of the other. > >> >>>>> > >> >>>>> -Edward > >> >>>>> > >> >>>>> On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton > >> >>>>> wrote: > >> >>>>>> > >> >>>>>> So that primitive is an array like thing (Same pointed type, > >> >>>>>> unbounded > >> >>>>>> length) with extra payload. > >> >>>>>> > >> >>>>>> I can see how we can do without structs if we have arrays, > >> >>>>>> especially > >> >>>>>> with the extra payload at front. But wouldn't the general > solution > >> >>>>>> for > >> >>>>>> structs be one that that allows new user data type defs for # > >> >>>>>> types? > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett > >> >>>>>> wrote: > >> >>>>>>> > >> >>>>>>> Some form of MutableStruct# with a known number of words and a > >> >>>>>>> known > >> >>>>>>> number of pointers is basically what Ryan Yates was suggesting > >> >>>>>>> above, but > >> >>>>>>> where the word counts were stored in the objects themselves. > >> >>>>>>> > >> >>>>>>> Given that it'd have a couple of words for those counts it'd > >> >>>>>>> likely > >> >>>>>>> want to be something we build in addition to MutVar# rather > than a > >> >>>>>>> replacement. > >> >>>>>>> > >> >>>>>>> On the other hand, if we had to fix those numbers and build info > >> >>>>>>> tables that knew them, and typechecker support, for instance, > it'd > >> >>>>>>> get > >> >>>>>>> rather invasive. > >> >>>>>>> > >> >>>>>>> Also, a number of things that we can do with the 'sized' > versions > >> >>>>>>> above, like working with evil unsized c-style arrays directly > >> >>>>>>> inline at the > >> >>>>>>> end of the structure cease to be possible, so it isn't even a > pure > >> >>>>>>> win if we > >> >>>>>>> did the engineering effort. > >> >>>>>>> > >> >>>>>>> I think 90% of the needs I have are covered just by adding the > one > >> >>>>>>> primitive. The last 10% gets pretty invasive. > >> >>>>>>> > >> >>>>>>> -Edward > >> >>>>>>> > >> >>>>>>> On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton < > rrnewton at gmail.com> > >> >>>>>>> wrote: > >> >>>>>>>> > >> >>>>>>>> I like the possibility of a general solution for mutable > structs > >> >>>>>>>> (like Ed said), and I'm trying to fully understand why it's > hard. > >> >>>>>>>> > >> >>>>>>>> So, we can't unpack MutVar into constructors because of object > >> >>>>>>>> identity problems. But what about directly supporting an > >> >>>>>>>> extensible set of > >> >>>>>>>> unlifted MutStruct# objects, generalizing (and even replacing) > >> >>>>>>>> MutVar#? That > >> >>>>>>>> may be too much work, but is it problematic otherwise? > >> >>>>>>>> > >> >>>>>>>> Needless to say, this is also critical if we ever want best in > >> >>>>>>>> class > >> >>>>>>>> lockfree mutable structures, just like their Stm and sequential > >> >>>>>>>> counterparts. > >> >>>>>>>> > >> >>>>>>>> On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones > >> >>>>>>>> wrote: > >> >>>>>>>>> > >> >>>>>>>>> At the very least I'll take this email and turn it into a > short > >> >>>>>>>>> article. > >> >>>>>>>>> > >> >>>>>>>>> Yes, please do make it into a wiki page on the GHC Trac, and > >> >>>>>>>>> maybe > >> >>>>>>>>> make a ticket for it. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Thanks > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Simon > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> From: Edward Kmett [mailto:ekmett at gmail.com] > >> >>>>>>>>> Sent: 27 August 2015 16:54 > >> >>>>>>>>> To: Simon Peyton Jones > >> >>>>>>>>> Cc: Manuel M T Chakravarty; Simon Marlow; ghc-devs > >> >>>>>>>>> Subject: Re: ArrayArrays > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> An ArrayArray# is just an Array# with a modified invariant. It > >> >>>>>>>>> points directly to other unlifted ArrayArray#'s or > ByteArray#'s. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> While those live in #, they are garbage collected objects, so > >> >>>>>>>>> this > >> >>>>>>>>> all lives on the heap. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> They were added to make some of the DPH stuff fast when it has > >> >>>>>>>>> to > >> >>>>>>>>> deal with nested arrays. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I'm currently abusing them as a placeholder for a better > thing. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> The Problem > >> >>>>>>>>> > >> >>>>>>>>> ----------------- > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Consider the scenario where you write a classic doubly-linked > >> >>>>>>>>> list > >> >>>>>>>>> in Haskell. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Chasing from one DLL to the next requires following 3 pointers > >> >>>>>>>>> on > >> >>>>>>>>> the heap. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> DLL ~> IORef (Maybe DLL) ~> MutVar# RealWorld (Maybe DLL) ~> > >> >>>>>>>>> Maybe > >> >>>>>>>>> DLL ~> DLL > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> That is 3 levels of indirection. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> We can trim one by simply unpacking the IORef with > >> >>>>>>>>> -funbox-strict-fields or UNPACK > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> We can trim another by adding a 'Nil' constructor for DLL and > >> >>>>>>>>> worsening our representation. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> but now we're still stuck with a level of indirection > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> DLL ~> MutVar# RealWorld DLL ~> DLL > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> This means that every operation we perform on this structure > >> >>>>>>>>> will > >> >>>>>>>>> be about half of the speed of an implementation in most other > >> >>>>>>>>> languages > >> >>>>>>>>> assuming we're memory bound on loading things into cache! > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Making Progress > >> >>>>>>>>> > >> >>>>>>>>> ---------------------- > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I have been working on a number of data structures where the > >> >>>>>>>>> indirection of going from something in * out to an object in # > >> >>>>>>>>> which > >> >>>>>>>>> contains the real pointer to my target and coming back > >> >>>>>>>>> effectively doubles > >> >>>>>>>>> my runtime. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> We go out to the MutVar# because we are allowed to put the > >> >>>>>>>>> MutVar# > >> >>>>>>>>> onto the mutable list when we dirty it. There is a well > defined > >> >>>>>>>>> write-barrier. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I could change out the representation to use > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL (MutableArray# RealWorld DLL) | Nil > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I can just store two pointers in the MutableArray# every time, > >> >>>>>>>>> but > >> >>>>>>>>> this doesn't help _much_ directly. It has reduced the amount > of > >> >>>>>>>>> distinct > >> >>>>>>>>> addresses in memory I touch on a walk of the DLL from 3 per > >> >>>>>>>>> object to 2. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I still have to go out to the heap from my DLL and get to the > >> >>>>>>>>> array > >> >>>>>>>>> object and then chase it to the next DLL and chase that to the > >> >>>>>>>>> next array. I > >> >>>>>>>>> do get my two pointers together in memory though. I'm paying > for > >> >>>>>>>>> a card > >> >>>>>>>>> marking table as well, which I don't particularly need with > just > >> >>>>>>>>> two > >> >>>>>>>>> pointers, but we can shed that with the "SmallMutableArray#" > >> >>>>>>>>> machinery added > >> >>>>>>>>> back in 7.10, which is just the old array code a a new data > >> >>>>>>>>> type, which can > >> >>>>>>>>> speed things up a bit when you don't have very big arrays: > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> But what if I wanted my object itself to live in # and have > two > >> >>>>>>>>> mutable fields and be able to share the sme write barrier? > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> An ArrayArray# points directly to other unlifted array types. > >> >>>>>>>>> What > >> >>>>>>>>> if we have one # -> * wrapper on the outside to deal with the > >> >>>>>>>>> impedence > >> >>>>>>>>> mismatch between the imperative world and Haskell, and then > just > >> >>>>>>>>> let the > >> >>>>>>>>> ArrayArray#'s hold other arrayarrays. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL (MutableArrayArray# RealWorld) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> now I need to make up a new Nil, which I can just make be a > >> >>>>>>>>> special > >> >>>>>>>>> MutableArrayArray# I allocate on program startup. I can even > >> >>>>>>>>> abuse pattern > >> >>>>>>>>> synonyms. Alternately I can exploit the internals further to > >> >>>>>>>>> make this > >> >>>>>>>>> cheaper. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Then I can use the readMutableArrayArray# and > >> >>>>>>>>> writeMutableArrayArray# calls to directly access the preceding > >> >>>>>>>>> and next > >> >>>>>>>>> entry in the linked list. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> So now we have one DLL wrapper which just 'bootstraps me' > into a > >> >>>>>>>>> strict world, and everything there lives in #. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> next :: DLL -> IO DLL > >> >>>>>>>>> > >> >>>>>>>>> next (DLL m) = IO $ \s -> case readMutableArrayArray# s of > >> >>>>>>>>> > >> >>>>>>>>> (# s', n #) -> (# s', DLL n #) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> It turns out GHC is quite happy to optimize all of that code > to > >> >>>>>>>>> keep things unboxed. The 'DLL' wrappers get removed pretty > >> >>>>>>>>> easily when they > >> >>>>>>>>> are known strict and you chain operations of this sort! > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Cleaning it Up > >> >>>>>>>>> > >> >>>>>>>>> ------------------ > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Now I have one outermost indirection pointing to an array that > >> >>>>>>>>> points directly to other arrays. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I'm stuck paying for a card marking table per object, but I > can > >> >>>>>>>>> fix > >> >>>>>>>>> that by duplicating the code for MutableArrayArray# and using > a > >> >>>>>>>>> SmallMutableArray#. I can hack up primops that let me store a > >> >>>>>>>>> mixture of > >> >>>>>>>>> SmallMutableArray# fields and normal ones in the data > structure. > >> >>>>>>>>> Operationally, I can even do so by just unsafeCoercing the > >> >>>>>>>>> existing > >> >>>>>>>>> SmallMutableArray# primitives to change the kind of one of the > >> >>>>>>>>> arguments it > >> >>>>>>>>> takes. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> This is almost ideal, but not quite. I often have fields that > >> >>>>>>>>> would > >> >>>>>>>>> be best left unboxed. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLLInt = DLL !Int !(IORef DLL) !(IORef DLL) | Nil > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> was able to unpack the Int, but we lost that. We can currently > >> >>>>>>>>> at > >> >>>>>>>>> best point one of the entries of the SmallMutableArray# at a > >> >>>>>>>>> boxed or at a > >> >>>>>>>>> MutableByteArray# for all of our misc. data and shove the int > in > >> >>>>>>>>> question in > >> >>>>>>>>> there. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> e.g. if I were to implement a hash-array-mapped-trie I need to > >> >>>>>>>>> store masks and administrivia as I walk down the tree. Having > to > >> >>>>>>>>> go off to > >> >>>>>>>>> the side costs me the entire win from avoiding the first > pointer > >> >>>>>>>>> chase. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> But, if like Ryan suggested, we had a heap object we could > >> >>>>>>>>> construct that had n words with unsafe access and m pointers > to > >> >>>>>>>>> other heap > >> >>>>>>>>> objects, one that could put itself on the mutable list when > any > >> >>>>>>>>> of those > >> >>>>>>>>> pointers changed then I could shed this last factor of two in > >> >>>>>>>>> all > >> >>>>>>>>> circumstances. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Prototype > >> >>>>>>>>> > >> >>>>>>>>> ------------- > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Over the last few days I've put together a small prototype > >> >>>>>>>>> implementation with a few non-trivial imperative data > structures > >> >>>>>>>>> for things > >> >>>>>>>>> like Tarjan's link-cut trees, the list labeling problem and > >> >>>>>>>>> order-maintenance. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> https://github.com/ekmett/structs > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Notable bits: > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Data.Struct.Internal.LinkCut provides an implementation of > >> >>>>>>>>> link-cut > >> >>>>>>>>> trees in this style. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Data.Struct.Internal provides the rather horrifying guts that > >> >>>>>>>>> make > >> >>>>>>>>> it go fast. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Once compiled with -O or -O2, if you look at the core, almost > >> >>>>>>>>> all > >> >>>>>>>>> the references to the LinkCut or Object data constructor get > >> >>>>>>>>> optimized away, > >> >>>>>>>>> and we're left with beautiful strict code directly mutating > out > >> >>>>>>>>> underlying > >> >>>>>>>>> representation. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> At the very least I'll take this email and turn it into a > short > >> >>>>>>>>> article. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> -Edward > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> On Thu, Aug 27, 2015 at 9:00 AM, Simon Peyton Jones > >> >>>>>>>>> wrote: > >> >>>>>>>>> > >> >>>>>>>>> Just to say that I have no idea what is going on in this > thread. > >> >>>>>>>>> What is ArrayArray? What is the issue in general? Is there a > >> >>>>>>>>> ticket? Is > >> >>>>>>>>> there a wiki page? > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> If it?s important, an ab-initio wiki page + ticket would be a > >> >>>>>>>>> good > >> >>>>>>>>> thing. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Simon > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On > Behalf > >> >>>>>>>>> Of > >> >>>>>>>>> Edward Kmett > >> >>>>>>>>> Sent: 21 August 2015 05:25 > >> >>>>>>>>> To: Manuel M T Chakravarty > >> >>>>>>>>> Cc: Simon Marlow; ghc-devs > >> >>>>>>>>> Subject: Re: ArrayArrays > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> When (ab)using them for this purpose, SmallArrayArray's would > be > >> >>>>>>>>> very handy as well. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Consider right now if I have something like an > order-maintenance > >> >>>>>>>>> structure I have: > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data Upper s = Upper {-# UNPACK #-} !(MutableByteArray s) {-# > >> >>>>>>>>> UNPACK #-} !(MutVar s (Upper s)) {-# UNPACK #-} !(MutVar s > >> >>>>>>>>> (Upper s)) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data Lower s = Lower {-# UNPACK #-} !(MutVar s (Upper s)) {-# > >> >>>>>>>>> UNPACK #-} !(MutableByteArray s) {-# UNPACK #-} !(MutVar s > >> >>>>>>>>> (Lower s)) {-# > >> >>>>>>>>> UNPACK #-} !(MutVar s (Lower s)) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> The former contains, logically, a mutable integer and two > >> >>>>>>>>> pointers, > >> >>>>>>>>> one for forward and one for backwards. The latter is basically > >> >>>>>>>>> the same > >> >>>>>>>>> thing with a mutable reference up pointing at the structure > >> >>>>>>>>> above. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> On the heap this is an object that points to a structure for > the > >> >>>>>>>>> bytearray, and points to another structure for each mutvar > which > >> >>>>>>>>> each point > >> >>>>>>>>> to the other 'Upper' structure. So there is a level of > >> >>>>>>>>> indirection smeared > >> >>>>>>>>> over everything. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> So this is a pair of doubly linked lists with an upward link > >> >>>>>>>>> from > >> >>>>>>>>> the structure below to the structure above. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Converted into ArrayArray#s I'd get > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data Upper s = Upper (MutableArrayArray# s) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> w/ the first slot being a pointer to a MutableByteArray#, and > >> >>>>>>>>> the > >> >>>>>>>>> next 2 slots pointing to the previous and next previous > objects, > >> >>>>>>>>> represented > >> >>>>>>>>> just as their MutableArrayArray#s. I can use > >> >>>>>>>>> sameMutableArrayArray# on these > >> >>>>>>>>> for object identity, which lets me check for the ends of the > >> >>>>>>>>> lists by tying > >> >>>>>>>>> things back on themselves. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> and below that > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data Lower s = Lower (MutableArrayArray# s) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> is similar, with an extra MutableArrayArray slot pointing up > to > >> >>>>>>>>> an > >> >>>>>>>>> upper structure. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I can then write a handful of combinators for getting out the > >> >>>>>>>>> slots > >> >>>>>>>>> in question, while it has gained a level of indirection > between > >> >>>>>>>>> the wrapper > >> >>>>>>>>> to put it in * and the MutableArrayArray# s in #, that one can > >> >>>>>>>>> be basically > >> >>>>>>>>> erased by ghc. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Unlike before I don't have several separate objects on the > heap > >> >>>>>>>>> for > >> >>>>>>>>> each thing. I only have 2 now. The MutableArrayArray# for the > >> >>>>>>>>> object itself, > >> >>>>>>>>> and the MutableByteArray# that it references to carry around > the > >> >>>>>>>>> mutable > >> >>>>>>>>> int. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> The only pain points are > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> 1.) the aforementioned limitation that currently prevents me > >> >>>>>>>>> from > >> >>>>>>>>> stuffing normal boxed data through a SmallArray or Array into > an > >> >>>>>>>>> ArrayArray > >> >>>>>>>>> leaving me in a little ghetto disconnected from the rest of > >> >>>>>>>>> Haskell, > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> and > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> 2.) the lack of SmallArrayArray's, which could let us avoid > the > >> >>>>>>>>> card marking overhead. These objects are all small, 3-4 > pointers > >> >>>>>>>>> wide. Card > >> >>>>>>>>> marking doesn't help. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Alternately I could just try to do really evil things and > >> >>>>>>>>> convert > >> >>>>>>>>> the whole mess to SmallArrays and then figure out how to > >> >>>>>>>>> unsafeCoerce my way > >> >>>>>>>>> to glory, stuffing the #'d references to the other arrays > >> >>>>>>>>> directly into the > >> >>>>>>>>> SmallArray as slots, removing the limitation we see here by > >> >>>>>>>>> aping the > >> >>>>>>>>> MutableArrayArray# s API, but that gets really really > dangerous! > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I'm pretty much willing to sacrifice almost anything on the > >> >>>>>>>>> altar > >> >>>>>>>>> of speed here, but I'd like to be able to let the GC move them > >> >>>>>>>>> and collect > >> >>>>>>>>> them which rules out simpler Ptr and Addr based solutions. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> -Edward > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> On Thu, Aug 20, 2015 at 9:01 PM, Manuel M T Chakravarty > >> >>>>>>>>> wrote: > >> >>>>>>>>> > >> >>>>>>>>> That?s an interesting idea. > >> >>>>>>>>> > >> >>>>>>>>> Manuel > >> >>>>>>>>> > >> >>>>>>>>> > Edward Kmett : > >> >>>>>>>>> > >> >>>>>>>>> > > >> >>>>>>>>> > Would it be possible to add unsafe primops to add Array# and > >> >>>>>>>>> > SmallArray# entries to an ArrayArray#? The fact that the > >> >>>>>>>>> > ArrayArray# entries > >> >>>>>>>>> > are all directly unlifted avoiding a level of indirection > for > >> >>>>>>>>> > the containing > >> >>>>>>>>> > structure is amazing, but I can only currently use it if my > >> >>>>>>>>> > leaf level data > >> >>>>>>>>> > can be 100% unboxed and distributed among ByteArray#s. It'd > be > >> >>>>>>>>> > nice to be > >> >>>>>>>>> > able to have the ability to put SmallArray# a stuff down at > >> >>>>>>>>> > the leaves to > >> >>>>>>>>> > hold lifted contents. > >> >>>>>>>>> > > >> >>>>>>>>> > I accept fully that if I name the wrong type when I go to > >> >>>>>>>>> > access > >> >>>>>>>>> > one of the fields it'll lie to me, but I suppose it'd do > that > >> >>>>>>>>> > if i tried to > >> >>>>>>>>> > use one of the members that held a nested ArrayArray# as a > >> >>>>>>>>> > ByteArray# > >> >>>>>>>>> > anyways, so it isn't like there is a safety story preventing > >> >>>>>>>>> > this. > >> >>>>>>>>> > > >> >>>>>>>>> > I've been hunting for ways to try to kill the indirection > >> >>>>>>>>> > problems I get with Haskell and mutable structures, and I > >> >>>>>>>>> > could shoehorn a > >> >>>>>>>>> > number of them into ArrayArrays if this worked. > >> >>>>>>>>> > > >> >>>>>>>>> > Right now I'm stuck paying for 2 or 3 levels of unnecessary > >> >>>>>>>>> > indirection compared to c/java and this could reduce that > pain > >> >>>>>>>>> > to just 1 > >> >>>>>>>>> > level of unnecessary indirection. > >> >>>>>>>>> > > >> >>>>>>>>> > -Edward > >> >>>>>>>>> > >> >>>>>>>>> > _______________________________________________ > >> >>>>>>>>> > ghc-devs mailing list > >> >>>>>>>>> > ghc-devs at haskell.org > >> >>>>>>>>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> _______________________________________________ > >> >>>>>>>>> ghc-devs mailing list > >> >>>>>>>>> ghc-devs at haskell.org > >> >>>>>>>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > >> >>>>>>> > >> >>>>>>> > >> >>>>> > >> >>> > >> >> > >> > > >> > > >> > _______________________________________________ > >> > ghc-devs mailing list > >> > ghc-devs at haskell.org > >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > >> > > > > > > > > > > > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kolmodin at gmail.com Tue Sep 8 06:11:40 2015 From: kolmodin at gmail.com (Lennart Kolmodin) Date: Tue, 8 Sep 2015 07:11:40 +0100 Subject: AnonymousSums data con syntax In-Reply-To: <1441657274.28403.7.camel@joachim-breitner.de> References: <9eb2c9041f6142ce947a4b323c0b2bff@DB4PR30MB030.064d.mgd.msft.net> <1441657274.28403.7.camel@joachim-breitner.de> Message-ID: 2015-09-07 21:21 GMT+01:00 Joachim Breitner : > Hi, > > Am Montag, den 07.09.2015, 19:25 +0000 schrieb Simon Peyton Jones: > > > Are we okay with stealing some operator sections for this? E.G. (x > > > > > ). I think the boxed sums larger than 2 choices are all > technically overlapping with sections. > > > > I hadn't thought of that. I suppose that in distfix notation we > > could require spaces > > (x | |) > > since vertical bar by itself isn't an operator. But then (_||) x > > might feel more compact. > > > > Also a section (x ||) isn't valid in a pattern, so we would not need > > to require spaces there. > > > > But my gut feel is: yes, with AnonymousSums we should just steal the > > syntax. It won't hurt existing code (since it won't use > > AnonymousSums), and if you *are* using AnonymousSums then the distfix > > notation is probably more valuable than the sections for an operator > > you probably aren't using. > > I wonder if this syntax for constructors is really that great. Yes, you > there is similarly with the type constructor (which is nice), but for > the data constructor, do we really want an unary encoding and have our > users count bars? > > I believe the user (and also us, having to read core) would be better > served by some syntax that involves plain numbers. > I reacted the same way to the proposed syntax. Imagine already having an anonymous sum type and then deciding adding another constructor. Naturally you'd have to update your code to handle the new constructor, but you also need to update the code for all other constructors as well by adding another bar in the right place. That seems unnecessary and there's no need to do that for named sum types. What about explicitly stating the index as a number? (1 | Int) :: ( String | Int | Bool ) (#1 | Int #) :: (# String | Int | Bool #) case sum of (0 | myString ) -> ... (1 | myInt ) -> ... (2 | myBool ) -> ... This allows you to at least add new constructors at the end without changing existing code. Is it harder to resolve by type inference since we're not stating the number of constructors? If so we could do something similar to Joachim's proposal; case sum of (0 of 3 | myString ) -> ... (1 of 3 | myInt ) -> ... (2 of 3 | myBool ) -> ... .. and at least you don't have to count bars. > Given that of is already a keyword, how about something involving "3 > of 4"? For example > > (Put# True in 3 of 5) :: (# a | b | Bool | d | e #) > > and > > case sum of > (Put# x in 1 of 3) -> ... > (Put# x in 2 of 3) -> ... > (Put# x in 3 of 3) -> ... > > (If "as" were a keyword, (Put# x as 2 of 3) would sound even better.) > > > I don?t find this particular choice very great, but something with > numbers rather than ASCII art seems to make more sense here. Is there > something even better? > > Greetings, > Joachim > > > > > -- > Joachim ?nomeata? Breitner > mail at joachim-breitner.de ? http://www.joachim-breitner.de/ > Jabber: nomeata at joachim-breitner.de ? GPG-Key: 0xF0FBF51F > Debian Developer: nomeata at debian.org > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marlowsd at gmail.com Tue Sep 8 07:40:29 2015 From: marlowsd at gmail.com (Simon Marlow) Date: Tue, 8 Sep 2015 08:40:29 +0100 Subject: ArrayArrays In-Reply-To: References: <2FCB6298-A4FF-4F7B-8BF8-4880BB3154AB@gmail.com> <325b043066bb48a79f254b75ba9753ee@DB4PR30MB030.064d.mgd.msft.net> Message-ID: <55EE90ED.1040609@gmail.com> This would be very cool, however it's questionable whether it's worth it. Without any unlifted kind, we need - ArrayArray# - a set of new/read/write primops for every element type, either built-in or made from unsafeCoerce# With the unlifted kind, we would need - ArrayArray# - one set of new/read/write primops With levity polymorphism, we would need - none of this, Array# can be used So having an unlifted kind already kills a lot of the duplication, polymorphism only kills a bit more. Cheers Simon On 08/09/2015 00:14, Edward Kmett wrote: > Assume we had the ability to talk about Levity in a new way and instead > of just: > > data Levity = Lifted | Unlifted > > type * = TYPE 'Lifted > type # = TYPE 'Unlifted > > we replace had a more nuanced notion of TYPE parameterized on another > data type: > > data Levity = Lifted | Unlifted > data Param = Composite | Simple Levity > > and we parameterized TYPE with a Param rather than Levity. > > Existing strange representations can continue to live in TYPE 'Composite > > (# Int# , Double #) :: TYPE 'Composite > > and we don't support parametricity in there, just like, currently we > don't allow parametricity in #. > > We can include the undefined example from Richard's talk: > > undefined :: forall (v :: Param). v > > and ultimately lift it into his pi type when it is available just as before. > > But we could let consider TYPE ('Simple 'Unlifted) as a form of > 'parametric #' covering unlifted things we're willing to allow > polymorphism over because they are just pointers to something in the > heap, that just happens to not be able to be _|_ or a thunk. > > In this setting, recalling that above, I modified Richard's TYPE to take > a Param instead of Levity, we can define a type alias for things that > live as a simple pointer to a heap allocated object: > > type GC (l :: Levity) = TYPE ('Simple l) > type * = GC 'Lifted > > and then we can look at existing primitives generalized: > > Array# :: forall (l :: Levity) (a :: GC l). a -> GC 'Unlifted > MutableArray# :: forall (l :: Levity) (a :: GC l). * -> a -> GC 'Unlifted > SmallArray# :: forall (l :: Levity) (a :: GC l). a -> GC 'Unlifted > SmallMutableArray# :: forall (l :: Levity) (a :: GC l). * -> a -> GC > 'Unlifted > MutVar# :: forall (l :: Levity) (a :: GC l). * -> a -> GC 'Unlifted > MVar# :: forall (l :: Levity) (a :: GC l). * -> a -> GC 'Unlifted > > Weak#, StablePtr#, StableName#, etc. all can take similar modifications. > > Recall that an ArrayArray# was just an Array# hacked up to be able to > hold onto the subset of # that is collectable. > > Almost all of the operations on these data types can work on the more > general kind of argument. > > newArray# :: forall (s :: *) (l :: Levity) (a :: GC l). Int# -> a -> > State# s -> (# State# s, MutableArray# s a #) > > writeArray# :: forall (s :: *) (l :: Levity) (a :: GC l). MutableArray# > s a -> Int# -> a -> State# s -> State# s > > readArray# :: forall (s :: *) (l :: Levity) (a :: GC l). MutableArray# s > a -> Int# -> State# s -> (# State# s, a #) > > etc. > > Only a couple of our existing primitives _can't_ generalize this way. > The one that leaps to mind is atomicModifyMutVar, which would need to > stay constrained to only work on arguments in *, because of the way it > operates. > > With that we can still talk about > > MutableArray# s Int > > but now we can also talk about: > > MutableArray# s (MutableArray# s Int) > > without the layer of indirection through a box in * and without an > explosion of primops. The same newFoo, readFoo, writeFoo machinery works > for both kinds. > > The struct machinery doesn't get to take advantage of this, but it would > let us clean house elsewhere in Prim and drastically improve the range > of applicability of the existing primitives with nothing more than a > small change to the levity machinery. > > I'm not attached to any of the names above, I coined them just to give > us a concrete thing to talk about. > > Here I'm only proposing we extend machinery in GHC.Prim this way, but an > interesting 'now that the barn door is open' question is to consider > that our existing Haskell data types often admit a similar form of > parametricity and nothing in principle prevents this from working for > Maybe or [] and once you permit inference to fire across all of GC l > then it seems to me that you'd start to get those same capabilities > there as well when LevityPolymorphism was turned on. > > -Edward > > On Mon, Sep 7, 2015 at 5:56 PM, Simon Peyton Jones > > wrote: > > This could make the menagerie of ways to pack > {Small}{Mutable}Array{Array}# references into a > {Small}{Mutable}Array{Array}#' actually typecheck soundly, reducing > the need for folks to descend into the use of the more evil > structure primitives we're talking about, and letting us keep a few > more principles around us.____ > > __ __ > > I?m lost. Can you give some concrete examples that illustrate how > levity polymorphism will help us?____ > > > Simon____ > > __ __ > > *From:*Edward Kmett [mailto:ekmett at gmail.com ] > *Sent:* 07 September 2015 21:17 > *To:* Simon Peyton Jones > *Cc:* Ryan Newton; Johan Tibell; Simon Marlow; Manuel M T > Chakravarty; Chao-Hong Chen; ghc-devs; Ryan Scott; Ryan Yates > *Subject:* Re: ArrayArrays____ > > __ __ > > I had a brief discussion with Richard during the Haskell Symposium > about how we might be able to let parametricity help a bit in > reducing the space of necessarily primops to a slightly more > manageable level. ____ > > __ __ > > Notably, it'd be interesting to explore the ability to allow > parametricity over the portion of # that is just a gcptr.____ > > __ __ > > We could do this if the levity polymorphism machinery was tweaked a > bit. You could envision the ability to abstract over things in both > * and the subset of # that are represented by a gcptr, then > modifying the existing array primitives to be parametric in that > choice of levity for their argument so long as it was of a "heap > object" levity.____ > > __ __ > > This could make the menagerie of ways to pack > {Small}{Mutable}Array{Array}# references into a > {Small}{Mutable}Array{Array}#' actually typecheck soundly, reducing > the need for folks to descend into the use of the more evil > structure primitives we're talking about, and letting us keep a few > more principles around us.____ > > __ __ > > Then in the cases like `atomicModifyMutVar#` where it needs to > actually be in * rather than just a gcptr, due to the constructed > field selectors it introduces on the heap then we could keep the > existing less polymorphic type.____ > > __ __ > > -Edward____ > > __ __ > > On Mon, Sep 7, 2015 at 9:59 AM, Simon Peyton Jones > > wrote:____ > > It was fun to meet and discuss this.____ > > ____ > > Did someone volunteer to write a wiki page that describes the > proposed design? And, I earnestly hope, also describes the > menagerie of currently available array types and primops so that > users can have some chance of picking the right one?!____ > > ____ > > Thanks____ > > ____ > > Simon____ > > ____ > > *From:*ghc-devs [mailto:ghc-devs-bounces at haskell.org > ] *On Behalf Of *Ryan Newton > *Sent:* 31 August 2015 23:11 > *To:* Edward Kmett; Johan Tibell > *Cc:* Simon Marlow; Manuel M T Chakravarty; Chao-Hong Chen; > ghc-devs; Ryan Scott; Ryan Yates > *Subject:* Re: ArrayArrays____ > > ____ > > Dear Edward, Ryan Yates, and other interested parties -- ____ > > ____ > > So when should we meet up about this?____ > > ____ > > May I propose the Tues afternoon break for everyone at ICFP who > is interested in this topic? We can meet out in the coffee area > and congregate around Edward Kmett, who is tall and should be > easy to find ;-).____ > > ____ > > I think Ryan is going to show us how to use his new primops for > combined array + other fields in one heap object?____ > > ____ > > On Sat, Aug 29, 2015 at 9:24 PM Edward Kmett > wrote:____ > > Without a custom primitive it doesn't help much there, you > have to store the indirection to the mask.____ > > ____ > > With a custom primitive it should cut the on heap > root-to-leaf path of everything in the HAMT in half. A > shorter HashMap was actually one of the motivating factors > for me doing this. It is rather astoundingly difficult to > beat the performance of HashMap, so I had to start cheating > pretty badly. ;)____ > > ____ > > -Edward____ > > ____ > > On Sat, Aug 29, 2015 at 5:45 PM, Johan Tibell > > > wrote:____ > > I'd also be interested to chat at ICFP to see if I can > use this for my HAMT implementation.____ > > ____ > > On Sat, Aug 29, 2015 at 3:07 PM, Edward Kmett > > wrote:____ > > Sounds good to me. Right now I'm just hacking up > composable accessors for "typed slots" in a fairly > lens-like fashion, and treating the set of slots I > define and the 'new' function I build for the data > type as its API, and build atop that. This could > eventually graduate to template-haskell, but I'm not > entirely satisfied with the solution I have. I > currently distinguish between what I'm calling > "slots" (things that point directly to another > SmallMutableArrayArray# sans wrapper) and "fields" > which point directly to the usual Haskell data types > because unifying the two notions meant that I > couldn't lift some coercions out "far enough" to > make them vanish.____ > > ____ > > I'll be happy to run through my current working set > of issues in person and -- as things get nailed down > further -- in a longer lived medium than in personal > conversations. ;)____ > > ____ > > -Edward____ > > ____ > > On Sat, Aug 29, 2015 at 7:59 AM, Ryan Newton > > > wrote:____ > > I'd also love to meet up at ICFP and discuss > this. I think the array primops plus a TH layer > that lets (ab)use them many times without too > much marginal cost sounds great. And I'd like > to learn how we could be either early users of, > or help with, this infrastructure.____ > > ____ > > CC'ing in Ryan Scot and Omer Agacan who may also > be interested in dropping in on such discussions > @ICFP, and Chao-Hong Chen, a Ph.D. student who > is currently working on concurrent data > structures in Haskell, but will not be at ICFP.____ > > ____ > > ____ > > On Fri, Aug 28, 2015 at 7:47 PM, Ryan Yates > > wrote:____ > > I completely agree. I would love to spend > some time during ICFP and > friends talking about what it could look > like. My small array for STM > changes for the RTS can be seen here [1]. > It is on a branch somewhere > between 7.8 and 7.10 and includes irrelevant > STM bits and some > confusing naming choices (sorry), but should > cover all the details > needed to implement it for a non-STM > context. The biggest surprise > for me was following small array too closely > and having a word/byte > offset miss-match [2]. > > [1]: > https://github.com/fryguybob/ghc/compare/ghc-htm-bloom...fryguybob:ghc-htm-mut > [2]: > https://ghc.haskell.org/trac/ghc/ticket/10413 > > Ryan____ > > > On Fri, Aug 28, 2015 at 10:09 PM, Edward > Kmett > wrote: > > I'd love to have that last 10%, but its a > lot of work to get there and more > > importantly I don't know quite what it > should look like. > > > > On the other hand, I do have a pretty > good idea of how the primitives above > > could be banged out and tested in a long > evening, well in time for 7.12. And > > as noted earlier, those remain useful > even if a nicer typed version with an > > extra level of indirection to the sizes > is built up after. > > > > The rest sounds like a good graduate > student project for someone who has > > graduate students lying around. Maybe > somebody at Indiana University who has > > an interest in type theory and > parallelism can find us one. =) > > > > -Edward > > > > On Fri, Aug 28, 2015 at 8:48 PM, Ryan > Yates > wrote: > >> > >> I think from my perspective, the > motivation for getting the type > >> checker involved is primarily bringing > this to the level where users > >> could be expected to build these > structures. it is reasonable to > >> think that there are people who want to > use STM (a context with > >> mutation already) to implement a > straight forward data structure that > >> avoids extra indirection penalty. There > should be some places where > >> knowing that things are field accesses > rather then array indexing > >> could be helpful, but I think GHC is > good right now about handling > >> constant offsets. In my code I don't do > any bounds checking as I know > >> I will only be accessing my arrays with > constant indexes. I make > >> wrappers for each field access and leave > all the unsafe stuff in > >> there. When things go wrong though, the > compiler is no help. Maybe > >> template Haskell that generates the > appropriate wrappers is the right > >> direction to go. > >> There is another benefit for me when > working with these as arrays in > >> that it is quite simple and direct > (given the hoops already jumped > >> through) to play with alignment. I can > ensure two pointers are never > >> on the same cache-line by just spacing > things out in the array. > >> > >> On Fri, Aug 28, 2015 at 7:33 PM, Edward > Kmett > wrote: > >> > They just segfault at this level. ;) > >> > > >> > Sent from my iPhone > >> > > >> > On Aug 28, 2015, at 7:25 PM, Ryan > Newton > wrote: > >> > > >> > You presumably also save a bounds > check on reads by hard-coding the > >> > sizes? > >> > > >> > On Fri, Aug 28, 2015 at 3:39 PM, > Edward Kmett > wrote: > >> >> > >> >> Also there are 4 different "things" > here, basically depending on two > >> >> independent questions: > >> >> > >> >> a.) if you want to shove the sizes > into the info table, and > >> >> b.) if you want cardmarking. > >> >> > >> >> Versions with/without cardmarking for > different sizes can be done > >> >> pretty > >> >> easily, but as noted, the infotable > variants are pretty invasive. > >> >> > >> >> -Edward > >> >> > >> >> On Fri, Aug 28, 2015 at 6:36 PM, > Edward Kmett > wrote: > >> >>> > >> >>> Well, on the plus side you'd save 16 > bytes per object, which adds up > >> >>> if > >> >>> they were small enough and there are > enough of them. You get a bit > >> >>> better > >> >>> locality of reference in terms of > what fits in the first cache line of > >> >>> them. > >> >>> > >> >>> -Edward > >> >>> > >> >>> On Fri, Aug 28, 2015 at 6:14 PM, > Ryan Newton > > >> >>> wrote: > >> >>>> > >> >>>> Yes. And for the short term I can > imagine places we will settle with > >> >>>> arrays even if it means tracking > lengths unnecessarily and > >> >>>> unsafeCoercing > >> >>>> pointers whose types don't actually > match their siblings. > >> >>>> > >> >>>> Is there anything to recommend the > hacks mentioned for fixed sized > >> >>>> array > >> >>>> objects *other* than using them to > fake structs? (Much to > >> >>>> derecommend, as > >> >>>> you mentioned!) > >> >>>> > >> >>>> On Fri, Aug 28, 2015 at 3:07 PM > Edward Kmett > > >> >>>> wrote: > >> >>>>> > >> >>>>> I think both are useful, but the > one you suggest requires a lot more > >> >>>>> plumbing and doesn't subsume all > of the usecases of the other. > >> >>>>> > >> >>>>> -Edward > >> >>>>> > >> >>>>> On Fri, Aug 28, 2015 at 5:51 PM, > Ryan Newton > > >> >>>>> wrote: > >> >>>>>> > >> >>>>>> So that primitive is an array > like thing (Same pointed type, > >> >>>>>> unbounded > >> >>>>>> length) with extra payload. > >> >>>>>> > >> >>>>>> I can see how we can do without > structs if we have arrays, > >> >>>>>> especially > >> >>>>>> with the extra payload at front. > But wouldn't the general solution > >> >>>>>> for > >> >>>>>> structs be one that that allows > new user data type defs for # > >> >>>>>> types? > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> On Fri, Aug 28, 2015 at 4:43 PM > Edward Kmett > > >> >>>>>> wrote: > >> >>>>>>> > >> >>>>>>> Some form of MutableStruct# with > a known number of words and a > >> >>>>>>> known > >> >>>>>>> number of pointers is basically > what Ryan Yates was suggesting > >> >>>>>>> above, but > >> >>>>>>> where the word counts were > stored in the objects themselves. > >> >>>>>>> > >> >>>>>>> Given that it'd have a couple of > words for those counts it'd > >> >>>>>>> likely > >> >>>>>>> want to be something we build in > addition to MutVar# rather than a > >> >>>>>>> replacement. > >> >>>>>>> > >> >>>>>>> On the other hand, if we had to > fix those numbers and build info > >> >>>>>>> tables that knew them, and > typechecker support, for instance, it'd > >> >>>>>>> get > >> >>>>>>> rather invasive. > >> >>>>>>> > >> >>>>>>> Also, a number of things that we > can do with the 'sized' versions > >> >>>>>>> above, like working with evil > unsized c-style arrays directly > >> >>>>>>> inline at the > >> >>>>>>> end of the structure cease to be > possible, so it isn't even a pure > >> >>>>>>> win if we > >> >>>>>>> did the engineering effort. > >> >>>>>>> > >> >>>>>>> I think 90% of the needs I have > are covered just by adding the one > >> >>>>>>> primitive. The last 10% gets > pretty invasive. > >> >>>>>>> > >> >>>>>>> -Edward > >> >>>>>>> > >> >>>>>>> On Fri, Aug 28, 2015 at 5:30 PM, > Ryan Newton > > >> >>>>>>> wrote: > >> >>>>>>>> > >> >>>>>>>> I like the possibility of a > general solution for mutable structs > >> >>>>>>>> (like Ed said), and I'm trying > to fully understand why it's hard. > >> >>>>>>>> > >> >>>>>>>> So, we can't unpack MutVar into > constructors because of object > >> >>>>>>>> identity problems. But what > about directly supporting an > >> >>>>>>>> extensible set of > >> >>>>>>>> unlifted MutStruct# objects, > generalizing (and even replacing) > >> >>>>>>>> MutVar#? That > >> >>>>>>>> may be too much work, but is it > problematic otherwise? > >> >>>>>>>> > >> >>>>>>>> Needless to say, this is also > critical if we ever want best in > >> >>>>>>>> class > >> >>>>>>>> lockfree mutable structures, > just like their Stm and sequential > >> >>>>>>>> counterparts. > >> >>>>>>>> > >> >>>>>>>> On Fri, Aug 28, 2015 at 4:43 AM > Simon Peyton Jones > >> >>>>>>>> > wrote: > >> >>>>>>>>> > >> >>>>>>>>> At the very least I'll take > this email and turn it into a short > >> >>>>>>>>> article. > >> >>>>>>>>> > >> >>>>>>>>> Yes, please do make it into a > wiki page on the GHC Trac, and > >> >>>>>>>>> maybe > >> >>>>>>>>> make a ticket for it. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Thanks > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Simon > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> From: Edward Kmett > [mailto:ekmett at gmail.com > ] > >> >>>>>>>>> Sent: 27 August 2015 16:54 > >> >>>>>>>>> To: Simon Peyton Jones > >> >>>>>>>>> Cc: Manuel M T Chakravarty; > Simon Marlow; ghc-devs > >> >>>>>>>>> Subject: Re: ArrayArrays > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> An ArrayArray# is just an > Array# with a modified invariant. It > >> >>>>>>>>> points directly to other > unlifted ArrayArray#'s or ByteArray#'s. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> While those live in #, they > are garbage collected objects, so > >> >>>>>>>>> this > >> >>>>>>>>> all lives on the heap. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> They were added to make some > of the DPH stuff fast when it has > >> >>>>>>>>> to > >> >>>>>>>>> deal with nested arrays. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I'm currently abusing them as > a placeholder for a better thing. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> The Problem > >> >>>>>>>>> > >> >>>>>>>>> ----------------- > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Consider the scenario where > you write a classic doubly-linked > >> >>>>>>>>> list > >> >>>>>>>>> in Haskell. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL (IORef (Maybe > DLL) (IORef (Maybe DLL) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Chasing from one DLL to the > next requires following 3 pointers > >> >>>>>>>>> on > >> >>>>>>>>> the heap. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> DLL ~> IORef (Maybe DLL) ~> > MutVar# RealWorld (Maybe DLL) ~> > >> >>>>>>>>> Maybe > >> >>>>>>>>> DLL ~> DLL > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> That is 3 levels of indirection. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> We can trim one by simply > unpacking the IORef with > >> >>>>>>>>> -funbox-strict-fields or UNPACK > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> We can trim another by adding > a 'Nil' constructor for DLL and > >> >>>>>>>>> worsening our representation. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL !(IORef DLL) > !(IORef DLL) | Nil > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> but now we're still stuck with > a level of indirection > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> DLL ~> MutVar# RealWorld DLL > ~> DLL > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> This means that every > operation we perform on this structure > >> >>>>>>>>> will > >> >>>>>>>>> be about half of the speed of > an implementation in most other > >> >>>>>>>>> languages > >> >>>>>>>>> assuming we're memory bound on > loading things into cache! > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Making Progress > >> >>>>>>>>> > >> >>>>>>>>> ---------------------- > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I have been working on a > number of data structures where the > >> >>>>>>>>> indirection of going from > something in * out to an object in # > >> >>>>>>>>> which > >> >>>>>>>>> contains the real pointer to > my target and coming back > >> >>>>>>>>> effectively doubles > >> >>>>>>>>> my runtime. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> We go out to the MutVar# > because we are allowed to put the > >> >>>>>>>>> MutVar# > >> >>>>>>>>> onto the mutable list when we > dirty it. There is a well defined > >> >>>>>>>>> write-barrier. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I could change out the > representation to use > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL (MutableArray# > RealWorld DLL) | Nil > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I can just store two pointers > in the MutableArray# every time, > >> >>>>>>>>> but > >> >>>>>>>>> this doesn't help _much_ > directly. It has reduced the amount of > >> >>>>>>>>> distinct > >> >>>>>>>>> addresses in memory I touch on > a walk of the DLL from 3 per > >> >>>>>>>>> object to 2. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I still have to go out to the > heap from my DLL and get to the > >> >>>>>>>>> array > >> >>>>>>>>> object and then chase it to > the next DLL and chase that to the > >> >>>>>>>>> next array. I > >> >>>>>>>>> do get my two pointers > together in memory though. I'm paying for > >> >>>>>>>>> a card > >> >>>>>>>>> marking table as well, which I > don't particularly need with just > >> >>>>>>>>> two > >> >>>>>>>>> pointers, but we can shed that > with the "SmallMutableArray#" > >> >>>>>>>>> machinery added > >> >>>>>>>>> back in 7.10, which is just > the old array code a a new data > >> >>>>>>>>> type, which can > >> >>>>>>>>> speed things up a bit when you > don't have very big arrays: > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL > (SmallMutableArray# RealWorld DLL) | Nil > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> But what if I wanted my object > itself to live in # and have two > >> >>>>>>>>> mutable fields and be able to > share the sme write barrier? > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> An ArrayArray# points directly > to other unlifted array types. > >> >>>>>>>>> What > >> >>>>>>>>> if we have one # -> * wrapper > on the outside to deal with the > >> >>>>>>>>> impedence > >> >>>>>>>>> mismatch between the > imperative world and Haskell, and then just > >> >>>>>>>>> let the > >> >>>>>>>>> ArrayArray#'s hold other > arrayarrays. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL > (MutableArrayArray# RealWorld) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> now I need to make up a new > Nil, which I can just make be a > >> >>>>>>>>> special > >> >>>>>>>>> MutableArrayArray# I allocate > on program startup. I can even > >> >>>>>>>>> abuse pattern > >> >>>>>>>>> synonyms. Alternately I can > exploit the internals further to > >> >>>>>>>>> make this > >> >>>>>>>>> cheaper. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Then I can use the > readMutableArrayArray# and > >> >>>>>>>>> writeMutableArrayArray# calls > to directly access the preceding > >> >>>>>>>>> and next > >> >>>>>>>>> entry in the linked list. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> So now we have one DLL wrapper > which just 'bootstraps me' into a > >> >>>>>>>>> strict world, and everything > there lives in #. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> next :: DLL -> IO DLL > >> >>>>>>>>> > >> >>>>>>>>> next (DLL m) = IO $ \s -> case > readMutableArrayArray# s of > >> >>>>>>>>> > >> >>>>>>>>> (# s', n #) -> (# s', DLL n #) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> It turns out GHC is quite > happy to optimize all of that code to > >> >>>>>>>>> keep things unboxed. The 'DLL' > wrappers get removed pretty > >> >>>>>>>>> easily when they > >> >>>>>>>>> are known strict and you chain > operations of this sort! > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Cleaning it Up > >> >>>>>>>>> > >> >>>>>>>>> ------------------ > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Now I have one outermost > indirection pointing to an array that > >> >>>>>>>>> points directly to other arrays. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I'm stuck paying for a card > marking table per object, but I can > >> >>>>>>>>> fix > >> >>>>>>>>> that by duplicating the code > for MutableArrayArray# and using a > >> >>>>>>>>> SmallMutableArray#. I can hack > up primops that let me store a > >> >>>>>>>>> mixture of > >> >>>>>>>>> SmallMutableArray# fields and > normal ones in the data structure. > >> >>>>>>>>> Operationally, I can even do > so by just unsafeCoercing the > >> >>>>>>>>> existing > >> >>>>>>>>> SmallMutableArray# primitives > to change the kind of one of the > >> >>>>>>>>> arguments it > >> >>>>>>>>> takes. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> This is almost ideal, but not > quite. I often have fields that > >> >>>>>>>>> would > >> >>>>>>>>> be best left unboxed. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLLInt = DLL !Int !(IORef > DLL) !(IORef DLL) | Nil > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> was able to unpack the Int, > but we lost that. We can currently > >> >>>>>>>>> at > >> >>>>>>>>> best point one of the entries > of the SmallMutableArray# at a > >> >>>>>>>>> boxed or at a > >> >>>>>>>>> MutableByteArray# for all of > our misc. data and shove the int in > >> >>>>>>>>> question in > >> >>>>>>>>> there. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> e.g. if I were to implement a > hash-array-mapped-trie I need to > >> >>>>>>>>> store masks and administrivia > as I walk down the tree. Having to > >> >>>>>>>>> go off to > >> >>>>>>>>> the side costs me the entire > win from avoiding the first pointer > >> >>>>>>>>> chase. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> But, if like Ryan suggested, > we had a heap object we could > >> >>>>>>>>> construct that had n words > with unsafe access and m pointers to > >> >>>>>>>>> other heap > >> >>>>>>>>> objects, one that could put > itself on the mutable list when any > >> >>>>>>>>> of those > >> >>>>>>>>> pointers changed then I could > shed this last factor of two in > >> >>>>>>>>> all > >> >>>>>>>>> circumstances. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Prototype > >> >>>>>>>>> > >> >>>>>>>>> ------------- > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Over the last few days I've > put together a small prototype > >> >>>>>>>>> implementation with a few > non-trivial imperative data structures > >> >>>>>>>>> for things > >> >>>>>>>>> like Tarjan's link-cut trees, > the list labeling problem and > >> >>>>>>>>> order-maintenance. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> https://github.com/ekmett/structs > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Notable bits: > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Data.Struct.Internal.LinkCut > provides an implementation of > >> >>>>>>>>> link-cut > >> >>>>>>>>> trees in this style. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Data.Struct.Internal provides > the rather horrifying guts that > >> >>>>>>>>> make > >> >>>>>>>>> it go fast. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Once compiled with -O or -O2, > if you look at the core, almost > >> >>>>>>>>> all > >> >>>>>>>>> the references to the LinkCut > or Object data constructor get > >> >>>>>>>>> optimized away, > >> >>>>>>>>> and we're left with beautiful > strict code directly mutating out > >> >>>>>>>>> underlying > >> >>>>>>>>> representation. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> At the very least I'll take > this email and turn it into a short > >> >>>>>>>>> article. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> -Edward > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> On Thu, Aug 27, 2015 at 9:00 > AM, Simon Peyton Jones > >> >>>>>>>>> > wrote: > >> >>>>>>>>> > >> >>>>>>>>> Just to say that I have no > idea what is going on in this thread. > >> >>>>>>>>> What is ArrayArray? What is > the issue in general? Is there a > >> >>>>>>>>> ticket? Is > >> >>>>>>>>> there a wiki page? > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> If it?s important, an > ab-initio wiki page + ticket would be a > >> >>>>>>>>> good > >> >>>>>>>>> thing. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Simon > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> From: ghc-devs > [mailto:ghc-devs-bounces at haskell.org > ] On Behalf > >> >>>>>>>>> Of > >> >>>>>>>>> Edward Kmett > >> >>>>>>>>> Sent: 21 August 2015 05:25 > >> >>>>>>>>> To: Manuel M T Chakravarty > >> >>>>>>>>> Cc: Simon Marlow; ghc-devs > >> >>>>>>>>> Subject: Re: ArrayArrays > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> When (ab)using them for this > purpose, SmallArrayArray's would be > >> >>>>>>>>> very handy as well. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Consider right now if I have > something like an order-maintenance > >> >>>>>>>>> structure I have: > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data Upper s = Upper {-# > UNPACK #-} !(MutableByteArray s) {-# > >> >>>>>>>>> UNPACK #-} !(MutVar s (Upper > s)) {-# UNPACK #-} !(MutVar s > >> >>>>>>>>> (Upper s)) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data Lower s = Lower {-# > UNPACK #-} !(MutVar s (Upper s)) {-# > >> >>>>>>>>> UNPACK #-} !(MutableByteArray > s) {-# UNPACK #-} !(MutVar s > >> >>>>>>>>> (Lower s)) {-# > >> >>>>>>>>> UNPACK #-} !(MutVar s (Lower s)) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> The former contains, > logically, a mutable integer and two > >> >>>>>>>>> pointers, > >> >>>>>>>>> one for forward and one for > backwards. The latter is basically > >> >>>>>>>>> the same > >> >>>>>>>>> thing with a mutable reference > up pointing at the structure > >> >>>>>>>>> above. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> On the heap this is an object > that points to a structure for the > >> >>>>>>>>> bytearray, and points to > another structure for each mutvar which > >> >>>>>>>>> each point > >> >>>>>>>>> to the other 'Upper' > structure. So there is a level of > >> >>>>>>>>> indirection smeared > >> >>>>>>>>> over everything. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> So this is a pair of doubly > linked lists with an upward link > >> >>>>>>>>> from > >> >>>>>>>>> the structure below to the > structure above. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Converted into ArrayArray#s > I'd get > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data Upper s = Upper > (MutableArrayArray# s) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> w/ the first slot being a > pointer to a MutableByteArray#, and > >> >>>>>>>>> the > >> >>>>>>>>> next 2 slots pointing to the > previous and next previous objects, > >> >>>>>>>>> represented > >> >>>>>>>>> just as their > MutableArrayArray#s. I can use > >> >>>>>>>>> sameMutableArrayArray# on these > >> >>>>>>>>> for object identity, which > lets me check for the ends of the > >> >>>>>>>>> lists by tying > >> >>>>>>>>> things back on themselves. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> and below that > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data Lower s = Lower > (MutableArrayArray# s) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> is similar, with an extra > MutableArrayArray slot pointing up to > >> >>>>>>>>> an > >> >>>>>>>>> upper structure. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I can then write a handful of > combinators for getting out the > >> >>>>>>>>> slots > >> >>>>>>>>> in question, while it has > gained a level of indirection between > >> >>>>>>>>> the wrapper > >> >>>>>>>>> to put it in * and the > MutableArrayArray# s in #, that one can > >> >>>>>>>>> be basically > >> >>>>>>>>> erased by ghc. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Unlike before I don't have > several separate objects on the heap > >> >>>>>>>>> for > >> >>>>>>>>> each thing. I only have 2 now. > The MutableArrayArray# for the > >> >>>>>>>>> object itself, > >> >>>>>>>>> and the MutableByteArray# that > it references to carry around the > >> >>>>>>>>> mutable > >> >>>>>>>>> int. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> The only pain points are > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> 1.) the aforementioned > limitation that currently prevents me > >> >>>>>>>>> from > >> >>>>>>>>> stuffing normal boxed data > through a SmallArray or Array into an > >> >>>>>>>>> ArrayArray > >> >>>>>>>>> leaving me in a little ghetto > disconnected from the rest of > >> >>>>>>>>> Haskell, > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> and > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> 2.) the lack of > SmallArrayArray's, which could let us avoid the > >> >>>>>>>>> card marking overhead. These > objects are all small, 3-4 pointers > >> >>>>>>>>> wide. Card > >> >>>>>>>>> marking doesn't help. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Alternately I could just try > to do really evil things and > >> >>>>>>>>> convert > >> >>>>>>>>> the whole mess to SmallArrays > and then figure out how to > >> >>>>>>>>> unsafeCoerce my way > >> >>>>>>>>> to glory, stuffing the #'d > references to the other arrays > >> >>>>>>>>> directly into the > >> >>>>>>>>> SmallArray as slots, removing > the limitation we see here by > >> >>>>>>>>> aping the > >> >>>>>>>>> MutableArrayArray# s API, but > that gets really really dangerous! > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I'm pretty much willing to > sacrifice almost anything on the > >> >>>>>>>>> altar > >> >>>>>>>>> of speed here, but I'd like to > be able to let the GC move them > >> >>>>>>>>> and collect > >> >>>>>>>>> them which rules out simpler > Ptr and Addr based solutions. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> -Edward > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> On Thu, Aug 20, 2015 at 9:01 > PM, Manuel M T Chakravarty > >> >>>>>>>>> > wrote: > >> >>>>>>>>> > >> >>>>>>>>> That?s an interesting idea. > >> >>>>>>>>> > >> >>>>>>>>> Manuel > >> >>>>>>>>> > >> >>>>>>>>> > Edward Kmett > >: > >> >>>>>>>>> > >> >>>>>>>>> > > >> >>>>>>>>> > Would it be possible to add > unsafe primops to add Array# and > >> >>>>>>>>> > SmallArray# entries to an > ArrayArray#? The fact that the > >> >>>>>>>>> > ArrayArray# entries > >> >>>>>>>>> > are all directly unlifted > avoiding a level of indirection for > >> >>>>>>>>> > the containing > >> >>>>>>>>> > structure is amazing, but I > can only currently use it if my > >> >>>>>>>>> > leaf level data > >> >>>>>>>>> > can be 100% unboxed and > distributed among ByteArray#s. It'd be > >> >>>>>>>>> > nice to be > >> >>>>>>>>> > able to have the ability to > put SmallArray# a stuff down at > >> >>>>>>>>> > the leaves to > >> >>>>>>>>> > hold lifted contents. > >> >>>>>>>>> > > >> >>>>>>>>> > I accept fully that if I > name the wrong type when I go to > >> >>>>>>>>> > access > >> >>>>>>>>> > one of the fields it'll lie > to me, but I suppose it'd do that > >> >>>>>>>>> > if i tried to > >> >>>>>>>>> > use one of the members that > held a nested ArrayArray# as a > >> >>>>>>>>> > ByteArray# > >> >>>>>>>>> > anyways, so it isn't like > there is a safety story preventing > >> >>>>>>>>> > this. > >> >>>>>>>>> > > >> >>>>>>>>> > I've been hunting for ways > to try to kill the indirection > >> >>>>>>>>> > problems I get with Haskell > and mutable structures, and I > >> >>>>>>>>> > could shoehorn a > >> >>>>>>>>> > number of them into > ArrayArrays if this worked. > >> >>>>>>>>> > > >> >>>>>>>>> > Right now I'm stuck paying > for 2 or 3 levels of unnecessary > >> >>>>>>>>> > indirection compared to > c/java and this could reduce that pain > >> >>>>>>>>> > to just 1 > >> >>>>>>>>> > level of unnecessary > indirection. > >> >>>>>>>>> > > >> >>>>>>>>> > -Edward > >> >>>>>>>>> > >> >>>>>>>>> > > _______________________________________________ > >> >>>>>>>>> > ghc-devs mailing list > >> >>>>>>>>> > ghc-devs at haskell.org > > >> >>>>>>>>> > > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > _______________________________________________ > >> >>>>>>>>> ghc-devs mailing list > >> >>>>>>>>> ghc-devs at haskell.org > > >> >>>>>>>>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > >> >>>>>>> > >> >>>>>>> > >> >>>>> > >> >>> > >> >> > >> > > >> > > >> > > _______________________________________________ > >> > ghc-devs mailing list > >> > ghc-devs at haskell.org > > >> > > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > >> > > > > >____ > > ____ > > ____ > > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs____ > > ____ > > ____ > > __ __ > > From simonpj at microsoft.com Tue Sep 8 07:40:51 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Tue, 8 Sep 2015 07:40:51 +0000 Subject: Unlifted data types In-Reply-To: <1441663307-sup-612@sabre> References: <1441353701-sup-9422@sabre> <6707b31c94d44af89ba2a90580ac46ce@DB4PR30MB030.064d.mgd.msft.net> <1441661177-sup-2150@sabre> <9cafcebc6d274b2385f202a4fd224174@DB4PR30MB030.064d.mgd.msft.net> <1441663307-sup-612@sabre> Message-ID: <11b6bb1806894856b0fcedda6884e083@DB4PR30MB030.064d.mgd.msft.net> | The problem 'Force' is trying to solve is the fact that Haskell | currently has many existing lifted data types, and they all have | ~essentially identical unlifted versions. But for a user to write the | lifted and unlifted version, they have to copy paste their code or use | 'Force'. But (Force [a]) will only be head-strict. You still have to make an essentially-identical version if you want a strict list. Ditto all components of a data structure. Is the gain (of head-strictness) really worth it? Incidentally, on the Unlifted-vs-# discussion, I'm not against making the distinction. I can see advantages in carving out a strict subset of Haskell, which this would help to do. Simon From simonpj at microsoft.com Tue Sep 8 07:52:12 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Tue, 8 Sep 2015 07:52:12 +0000 Subject: Unlifted data types In-Reply-To: References: <1441353701-sup-9422@sabre> <6707b31c94d44af89ba2a90580ac46ce@DB4PR30MB030.064d.mgd.msft.net> <6e2bcecf1a284c62a656e80992e9862e@DB4PR30MB030.064d.mgd.msft.net> Message-ID: | And to | be honest, I'm not sure we need arbitrary data types in Unlifted; | Force (which would be primitive) might be enough. That's an interesting thought. But presumably you'd have to use 'suspend' (a terrible name) a lot: type StrictList a = Force (StrictList' a) data StrictList' a = Nil | Cons !a (StrictList a) mapStrict :: (a -> b) -> StrictList a -> StrictList b mapStrict f xs = mapStrict' f (suspend xs) mapStrict' :: (a -> b) -> StrictList' a -> StrictList' b mapStrict' f Nil = Nil mapStrict' f (Cons x xs) = Cons (f x) (mapStrict f xs) That doesn't look terribly convenient. | ensure that threads don't simply | pass thunks between each other. But, if you have unlifted types, then | you can have: | | data UMVar (a :: Unlifted) | | and then the type rules out the possibility of passing thunks through | a reference (at least at the top level). Really? Presumably UMVar is a new primitive? With a family of operations like MVar? If so can't we just define newtype UMVar a = UMV (MVar a) putUMVar :: UMVar a -> a -> IO () putUMVar (UMVar v) x = x `seq` putMVar v x I don't see Force helping here. Simon From marlowsd at gmail.com Tue Sep 8 07:53:00 2015 From: marlowsd at gmail.com (Simon Marlow) Date: Tue, 8 Sep 2015 08:53:00 +0100 Subject: Unpacking sum types In-Reply-To: References: Message-ID: <55EE93DC.7050409@gmail.com> On 07/09/2015 15:35, Simon Peyton Jones wrote: > Good start. > > I have updated the page to separate the source-language design (what the > programmer sees) from the implementation. > > And I have included boxed sums as well ? it would be deeply strange not > to do so. How did you envisage implementing anonymous boxed sums? What is their heap representation? One option is to use some kind of generic object with a dynamic number of pointers and non-pointers, and one field for the tag. The layout would need to be stored in the object. This isn't a particularly efficient representation, though. Perhaps there could be a family of smaller specialised versions for common sizes. Do we have a use case for the boxed version, or is it just for consistency? Cheers Simon > Looks good to me! > > Simon > > *From:*Johan Tibell [mailto:johan.tibell at gmail.com] > *Sent:* 01 September 2015 18:24 > *To:* Simon Peyton Jones; Simon Marlow; Ryan Newton > *Cc:* ghc-devs at haskell.org > *Subject:* RFC: Unpacking sum types > > I have a draft design for unpacking sum types that I'd like some > feedback on. In particular feedback both on: > > * the writing and clarity of the proposal and > > * the proposal itself. > > https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes > > -- Johan > From mail at joachim-breitner.de Tue Sep 8 08:14:43 2015 From: mail at joachim-breitner.de (Joachim Breitner) Date: Tue, 08 Sep 2015 10:14:43 +0200 Subject: Unpacking sum types In-Reply-To: <55EE93DC.7050409@gmail.com> References: <55EE93DC.7050409@gmail.com> Message-ID: <1441700083.1307.7.camel@joachim-breitner.de> Hi, Am Dienstag, den 08.09.2015, 08:53 +0100 schrieb Simon Marlow: > On 07/09/2015 15:35, Simon Peyton Jones wrote: > > Good start. > > > > I have updated the page to separate the source-language design (what the > > programmer sees) from the implementation. > > > > And I have included boxed sums as well ? it would be deeply strange not > > to do so. > > How did you envisage implementing anonymous boxed sums? What is their > heap representation? > > One option is to use some kind of generic object with a dynamic number > of pointers and non-pointers, and one field for the tag. Why a dynamic number of pointers? All constructors of an anonymous sum type contain precisely one pointer (just like Left and Right do), as they are normal boxed, polymorphic data types. Also the constructors (0 of 1 | _ ) (0 of 2 | _ ) (0 of 3 | _ ) (using Lennart?s syntax here) can all use the same info-table: At runtime, we only care about the tag, not the arity of the sum type. So just like for products, we could statically generate info tables for the constructors (0 of ? | _ ) (1 of ? | _ ) ? (63 of ? | _ ) and simply do not support more than these. (Or, if we really want to support these, start to nest them. 63? will already go a long way... :-)) Greetings, Joachim -- Joachim ?nomeata? Breitner mail at joachim-breitner.de ? http://www.joachim-breitner.de/ Jabber: nomeata at joachim-breitner.de ? GPG-Key: 0xF0FBF51F Debian Developer: nomeata at debian.org -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From simonpj at microsoft.com Tue Sep 8 08:28:50 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Tue, 8 Sep 2015 08:28:50 +0000 Subject: AnonymousSums data con syntax In-Reply-To: References: <9eb2c9041f6142ce947a4b323c0b2bff@DB4PR30MB030.064d.mgd.msft.net> <1441657274.28403.7.camel@joachim-breitner.de> Message-ID: I can see the force of this discussion about data type constructors for sums, but ? We already do this for tuples: (,,,,) is a type constructor and you have to count commas. We could use a number here but we don?t. ? Likewise tuple sections. (,,e,) means (\xyz. (x,y,e,z)) I do not expect big sums in practice. That said, (2/5| True) instead of (|True|||) would be ok I suppose. Or something like that. Simon From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Lennart Kolmodin Sent: 08 September 2015 07:12 To: Joachim Breitner Cc: ghc-devs at haskell.org Subject: Re: AnonymousSums data con syntax 2015-09-07 21:21 GMT+01:00 Joachim Breitner >: Hi, Am Montag, den 07.09.2015, 19:25 +0000 schrieb Simon Peyton Jones: > > Are we okay with stealing some operator sections for this? E.G. (x > > > > ). I think the boxed sums larger than 2 choices are all technically overlapping with sections. > > I hadn't thought of that. I suppose that in distfix notation we > could require spaces > (x | |) > since vertical bar by itself isn't an operator. But then (_||) x > might feel more compact. > > Also a section (x ||) isn't valid in a pattern, so we would not need > to require spaces there. > > But my gut feel is: yes, with AnonymousSums we should just steal the > syntax. It won't hurt existing code (since it won't use > AnonymousSums), and if you *are* using AnonymousSums then the distfix > notation is probably more valuable than the sections for an operator > you probably aren't using. I wonder if this syntax for constructors is really that great. Yes, you there is similarly with the type constructor (which is nice), but for the data constructor, do we really want an unary encoding and have our users count bars? I believe the user (and also us, having to read core) would be better served by some syntax that involves plain numbers. I reacted the same way to the proposed syntax. Imagine already having an anonymous sum type and then deciding adding another constructor. Naturally you'd have to update your code to handle the new constructor, but you also need to update the code for all other constructors as well by adding another bar in the right place. That seems unnecessary and there's no need to do that for named sum types. What about explicitly stating the index as a number? (1 | Int) :: ( String | Int | Bool ) (#1 | Int #) :: (# String | Int | Bool #) case sum of (0 | myString ) -> ... (1 | myInt ) -> ... (2 | myBool ) -> ... This allows you to at least add new constructors at the end without changing existing code. Is it harder to resolve by type inference since we're not stating the number of constructors? If so we could do something similar to Joachim's proposal; case sum of (0 of 3 | myString ) -> ... (1 of 3 | myInt ) -> ... (2 of 3 | myBool ) -> ... .. and at least you don't have to count bars. Given that of is already a keyword, how about something involving "3 of 4"? For example (Put# True in 3 of 5) :: (# a | b | Bool | d | e #) and case sum of (Put# x in 1 of 3) -> ... (Put# x in 2 of 3) -> ... (Put# x in 3 of 3) -> ... (If "as" were a keyword, (Put# x as 2 of 3) would sound even better.) I don?t find this particular choice very great, but something with numbers rather than ASCII art seems to make more sense here. Is there something even better? Greetings, Joachim -- Joachim ?nomeata? Breitner mail at joachim-breitner.de ? http://www.joachim-breitner.de/ Jabber: nomeata at joachim-breitner.de ? GPG-Key: 0xF0FBF51F Debian Developer: nomeata at debian.org _______________________________________________ ghc-devs mailing list ghc-devs at haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs -------------- next part -------------- An HTML attachment was scrubbed... URL: From ekmett at gmail.com Tue Sep 8 08:29:36 2015 From: ekmett at gmail.com (Edward Kmett) Date: Tue, 8 Sep 2015 04:29:36 -0400 Subject: ArrayArrays In-Reply-To: <55EE90ED.1040609@gmail.com> References: <2FCB6298-A4FF-4F7B-8BF8-4880BB3154AB@gmail.com> <325b043066bb48a79f254b75ba9753ee@DB4PR30MB030.064d.mgd.msft.net> <55EE90ED.1040609@gmail.com> Message-ID: Once you start to include all the other primitive types there is a bit more of an explosion. MVar#, TVar#, MutVar#, Small variants, etc. can all be modified to carry unlifted content. Being able to be parametric over that choice would permit a number of things in user land to do the same thing with an open-ended set of design possibilities that are rather hard to contemplate in advance. e.g. being able to abstract over them could let you just use a normal (,) to carry around unlifted parametric data types or being able to talk about [MVar# s a] drastically reducing the number of one off data types we need to invent. If you can talk about the machinery mentioned above then you can have typeclasses parameterized on an argument that could be either unlifted or lifted. I'm not willing to fight too hard for it, but it feels more like the "right" solution than retaining a cut-and-paste copy of the same code and bifurcating further on each argument you want to consider such a degree of freedom. As such it seems like a pretty big win for a comparatively minor change to the levity polymorphism machinery. -Edward On Tue, Sep 8, 2015 at 3:40 AM, Simon Marlow wrote: > This would be very cool, however it's questionable whether it's worth it. > > Without any unlifted kind, we need > - ArrayArray# > - a set of new/read/write primops for every element type, > either built-in or made from unsafeCoerce# > > With the unlifted kind, we would need > - ArrayArray# > - one set of new/read/write primops > > With levity polymorphism, we would need > - none of this, Array# can be used > > So having an unlifted kind already kills a lot of the duplication, > polymorphism only kills a bit more. > > Cheers > Simon > > On 08/09/2015 00:14, Edward Kmett wrote: > >> Assume we had the ability to talk about Levity in a new way and instead >> of just: >> >> data Levity = Lifted | Unlifted >> >> type * = TYPE 'Lifted >> type # = TYPE 'Unlifted >> >> we replace had a more nuanced notion of TYPE parameterized on another >> data type: >> >> data Levity = Lifted | Unlifted >> data Param = Composite | Simple Levity >> >> and we parameterized TYPE with a Param rather than Levity. >> >> Existing strange representations can continue to live in TYPE 'Composite >> >> (# Int# , Double #) :: TYPE 'Composite >> >> and we don't support parametricity in there, just like, currently we >> don't allow parametricity in #. >> >> We can include the undefined example from Richard's talk: >> >> undefined :: forall (v :: Param). v >> >> and ultimately lift it into his pi type when it is available just as >> before. >> >> But we could let consider TYPE ('Simple 'Unlifted) as a form of >> 'parametric #' covering unlifted things we're willing to allow >> polymorphism over because they are just pointers to something in the >> heap, that just happens to not be able to be _|_ or a thunk. >> >> In this setting, recalling that above, I modified Richard's TYPE to take >> a Param instead of Levity, we can define a type alias for things that >> live as a simple pointer to a heap allocated object: >> >> type GC (l :: Levity) = TYPE ('Simple l) >> type * = GC 'Lifted >> >> and then we can look at existing primitives generalized: >> >> Array# :: forall (l :: Levity) (a :: GC l). a -> GC 'Unlifted >> MutableArray# :: forall (l :: Levity) (a :: GC l). * -> a -> GC 'Unlifted >> SmallArray# :: forall (l :: Levity) (a :: GC l). a -> GC 'Unlifted >> SmallMutableArray# :: forall (l :: Levity) (a :: GC l). * -> a -> GC >> 'Unlifted >> MutVar# :: forall (l :: Levity) (a :: GC l). * -> a -> GC 'Unlifted >> MVar# :: forall (l :: Levity) (a :: GC l). * -> a -> GC 'Unlifted >> >> Weak#, StablePtr#, StableName#, etc. all can take similar modifications. >> >> Recall that an ArrayArray# was just an Array# hacked up to be able to >> hold onto the subset of # that is collectable. >> >> Almost all of the operations on these data types can work on the more >> general kind of argument. >> >> newArray# :: forall (s :: *) (l :: Levity) (a :: GC l). Int# -> a -> >> State# s -> (# State# s, MutableArray# s a #) >> >> writeArray# :: forall (s :: *) (l :: Levity) (a :: GC l). MutableArray# >> s a -> Int# -> a -> State# s -> State# s >> >> readArray# :: forall (s :: *) (l :: Levity) (a :: GC l). MutableArray# s >> a -> Int# -> State# s -> (# State# s, a #) >> >> etc. >> >> Only a couple of our existing primitives _can't_ generalize this way. >> The one that leaps to mind is atomicModifyMutVar, which would need to >> stay constrained to only work on arguments in *, because of the way it >> operates. >> >> With that we can still talk about >> >> MutableArray# s Int >> >> but now we can also talk about: >> >> MutableArray# s (MutableArray# s Int) >> >> without the layer of indirection through a box in * and without an >> explosion of primops. The same newFoo, readFoo, writeFoo machinery works >> for both kinds. >> >> The struct machinery doesn't get to take advantage of this, but it would >> let us clean house elsewhere in Prim and drastically improve the range >> of applicability of the existing primitives with nothing more than a >> small change to the levity machinery. >> >> I'm not attached to any of the names above, I coined them just to give >> us a concrete thing to talk about. >> >> Here I'm only proposing we extend machinery in GHC.Prim this way, but an >> interesting 'now that the barn door is open' question is to consider >> that our existing Haskell data types often admit a similar form of >> parametricity and nothing in principle prevents this from working for >> Maybe or [] and once you permit inference to fire across all of GC l >> then it seems to me that you'd start to get those same capabilities >> there as well when LevityPolymorphism was turned on. >> >> -Edward >> >> On Mon, Sep 7, 2015 at 5:56 PM, Simon Peyton Jones >> > wrote: >> >> This could make the menagerie of ways to pack >> {Small}{Mutable}Array{Array}# references into a >> {Small}{Mutable}Array{Array}#' actually typecheck soundly, reducing >> the need for folks to descend into the use of the more evil >> structure primitives we're talking about, and letting us keep a few >> more principles around us.____ >> >> __ __ >> >> I?m lost. Can you give some concrete examples that illustrate how >> levity polymorphism will help us?____ >> >> >> Simon____ >> >> __ __ >> >> *From:*Edward Kmett [mailto:ekmett at gmail.com > >] >> *Sent:* 07 September 2015 21:17 >> *To:* Simon Peyton Jones >> *Cc:* Ryan Newton; Johan Tibell; Simon Marlow; Manuel M T >> Chakravarty; Chao-Hong Chen; ghc-devs; Ryan Scott; Ryan Yates >> *Subject:* Re: ArrayArrays____ >> >> __ __ >> >> I had a brief discussion with Richard during the Haskell Symposium >> about how we might be able to let parametricity help a bit in >> reducing the space of necessarily primops to a slightly more >> manageable level. ____ >> >> __ __ >> >> Notably, it'd be interesting to explore the ability to allow >> parametricity over the portion of # that is just a gcptr.____ >> >> __ __ >> >> We could do this if the levity polymorphism machinery was tweaked a >> bit. You could envision the ability to abstract over things in both >> * and the subset of # that are represented by a gcptr, then >> modifying the existing array primitives to be parametric in that >> choice of levity for their argument so long as it was of a "heap >> object" levity.____ >> >> __ __ >> >> This could make the menagerie of ways to pack >> {Small}{Mutable}Array{Array}# references into a >> {Small}{Mutable}Array{Array}#' actually typecheck soundly, reducing >> the need for folks to descend into the use of the more evil >> structure primitives we're talking about, and letting us keep a few >> more principles around us.____ >> >> __ __ >> >> Then in the cases like `atomicModifyMutVar#` where it needs to >> actually be in * rather than just a gcptr, due to the constructed >> field selectors it introduces on the heap then we could keep the >> existing less polymorphic type.____ >> >> __ __ >> >> -Edward____ >> >> __ __ >> >> On Mon, Sep 7, 2015 at 9:59 AM, Simon Peyton Jones >> > wrote:____ >> >> It was fun to meet and discuss this.____ >> >> ____ >> >> Did someone volunteer to write a wiki page that describes the >> proposed design? And, I earnestly hope, also describes the >> menagerie of currently available array types and primops so that >> users can have some chance of picking the right one?!____ >> >> ____ >> >> Thanks____ >> >> ____ >> >> Simon____ >> >> ____ >> >> *From:*ghc-devs [mailto:ghc-devs-bounces at haskell.org >> ] *On Behalf Of *Ryan Newton >> *Sent:* 31 August 2015 23:11 >> *To:* Edward Kmett; Johan Tibell >> *Cc:* Simon Marlow; Manuel M T Chakravarty; Chao-Hong Chen; >> ghc-devs; Ryan Scott; Ryan Yates >> *Subject:* Re: ArrayArrays____ >> >> ____ >> >> Dear Edward, Ryan Yates, and other interested parties -- ____ >> >> ____ >> >> So when should we meet up about this?____ >> >> ____ >> >> May I propose the Tues afternoon break for everyone at ICFP who >> is interested in this topic? We can meet out in the coffee area >> and congregate around Edward Kmett, who is tall and should be >> easy to find ;-).____ >> >> ____ >> >> I think Ryan is going to show us how to use his new primops for >> combined array + other fields in one heap object?____ >> >> ____ >> >> On Sat, Aug 29, 2015 at 9:24 PM Edward Kmett > > wrote:____ >> >> Without a custom primitive it doesn't help much there, you >> have to store the indirection to the mask.____ >> >> ____ >> >> With a custom primitive it should cut the on heap >> root-to-leaf path of everything in the HAMT in half. A >> shorter HashMap was actually one of the motivating factors >> for me doing this. It is rather astoundingly difficult to >> beat the performance of HashMap, so I had to start cheating >> pretty badly. ;)____ >> >> ____ >> >> -Edward____ >> >> ____ >> >> On Sat, Aug 29, 2015 at 5:45 PM, Johan Tibell >> > >> wrote:____ >> >> I'd also be interested to chat at ICFP to see if I can >> use this for my HAMT implementation.____ >> >> ____ >> >> On Sat, Aug 29, 2015 at 3:07 PM, Edward Kmett >> > wrote:____ >> >> Sounds good to me. Right now I'm just hacking up >> composable accessors for "typed slots" in a fairly >> lens-like fashion, and treating the set of slots I >> define and the 'new' function I build for the data >> type as its API, and build atop that. This could >> eventually graduate to template-haskell, but I'm not >> entirely satisfied with the solution I have. I >> currently distinguish between what I'm calling >> "slots" (things that point directly to another >> SmallMutableArrayArray# sans wrapper) and "fields" >> which point directly to the usual Haskell data types >> because unifying the two notions meant that I >> couldn't lift some coercions out "far enough" to >> make them vanish.____ >> >> ____ >> >> I'll be happy to run through my current working set >> of issues in person and -- as things get nailed down >> further -- in a longer lived medium than in personal >> conversations. ;)____ >> >> ____ >> >> -Edward____ >> >> ____ >> >> On Sat, Aug 29, 2015 at 7:59 AM, Ryan Newton >> > >> wrote:____ >> >> I'd also love to meet up at ICFP and discuss >> this. I think the array primops plus a TH layer >> that lets (ab)use them many times without too >> much marginal cost sounds great. And I'd like >> to learn how we could be either early users of, >> or help with, this infrastructure.____ >> >> ____ >> >> CC'ing in Ryan Scot and Omer Agacan who may also >> be interested in dropping in on such discussions >> @ICFP, and Chao-Hong Chen, a Ph.D. student who >> is currently working on concurrent data >> structures in Haskell, but will not be at >> ICFP.____ >> >> ____ >> >> ____ >> >> On Fri, Aug 28, 2015 at 7:47 PM, Ryan Yates >> > > wrote:____ >> >> I completely agree. I would love to spend >> some time during ICFP and >> friends talking about what it could look >> like. My small array for STM >> changes for the RTS can be seen here [1]. >> It is on a branch somewhere >> between 7.8 and 7.10 and includes irrelevant >> STM bits and some >> confusing naming choices (sorry), but should >> cover all the details >> needed to implement it for a non-STM >> context. The biggest surprise >> for me was following small array too closely >> and having a word/byte >> offset miss-match [2]. >> >> [1]: >> >> https://github.com/fryguybob/ghc/compare/ghc-htm-bloom...fryguybob:ghc-htm-mut >> [2]: >> https://ghc.haskell.org/trac/ghc/ticket/10413 >> >> Ryan____ >> >> >> On Fri, Aug 28, 2015 at 10:09 PM, Edward >> Kmett > > wrote: >> > I'd love to have that last 10%, but its a >> lot of work to get there and more >> > importantly I don't know quite what it >> should look like. >> > >> > On the other hand, I do have a pretty >> good idea of how the primitives above >> > could be banged out and tested in a long >> evening, well in time for 7.12. And >> > as noted earlier, those remain useful >> even if a nicer typed version with an >> > extra level of indirection to the sizes >> is built up after. >> > >> > The rest sounds like a good graduate >> student project for someone who has >> > graduate students lying around. Maybe >> somebody at Indiana University who has >> > an interest in type theory and >> parallelism can find us one. =) >> > >> > -Edward >> > >> > On Fri, Aug 28, 2015 at 8:48 PM, Ryan >> Yates > > wrote: >> >> >> >> I think from my perspective, the >> motivation for getting the type >> >> checker involved is primarily bringing >> this to the level where users >> >> could be expected to build these >> structures. it is reasonable to >> >> think that there are people who want to >> use STM (a context with >> >> mutation already) to implement a >> straight forward data structure that >> >> avoids extra indirection penalty. There >> should be some places where >> >> knowing that things are field accesses >> rather then array indexing >> >> could be helpful, but I think GHC is >> good right now about handling >> >> constant offsets. In my code I don't do >> any bounds checking as I know >> >> I will only be accessing my arrays with >> constant indexes. I make >> >> wrappers for each field access and leave >> all the unsafe stuff in >> >> there. When things go wrong though, the >> compiler is no help. Maybe >> >> template Haskell that generates the >> appropriate wrappers is the right >> >> direction to go. >> >> There is another benefit for me when >> working with these as arrays in >> >> that it is quite simple and direct >> (given the hoops already jumped >> >> through) to play with alignment. I can >> ensure two pointers are never >> >> on the same cache-line by just spacing >> things out in the array. >> >> >> >> On Fri, Aug 28, 2015 at 7:33 PM, Edward >> Kmett > > wrote: >> >> > They just segfault at this level. ;) >> >> > >> >> > Sent from my iPhone >> >> > >> >> > On Aug 28, 2015, at 7:25 PM, Ryan >> Newton > > wrote: >> >> > >> >> > You presumably also save a bounds >> check on reads by hard-coding the >> >> > sizes? >> >> > >> >> > On Fri, Aug 28, 2015 at 3:39 PM, >> Edward Kmett > > wrote: >> >> >> >> >> >> Also there are 4 different "things" >> here, basically depending on two >> >> >> independent questions: >> >> >> >> >> >> a.) if you want to shove the sizes >> into the info table, and >> >> >> b.) if you want cardmarking. >> >> >> >> >> >> Versions with/without cardmarking for >> different sizes can be done >> >> >> pretty >> >> >> easily, but as noted, the infotable >> variants are pretty invasive. >> >> >> >> >> >> -Edward >> >> >> >> >> >> On Fri, Aug 28, 2015 at 6:36 PM, >> Edward Kmett > > wrote: >> >> >>> >> >> >>> Well, on the plus side you'd save 16 >> bytes per object, which adds up >> >> >>> if >> >> >>> they were small enough and there are >> enough of them. You get a bit >> >> >>> better >> >> >>> locality of reference in terms of >> what fits in the first cache line of >> >> >>> them. >> >> >>> >> >> >>> -Edward >> >> >>> >> >> >>> On Fri, Aug 28, 2015 at 6:14 PM, >> Ryan Newton > > >> >> >>> wrote: >> >> >>>> >> >> >>>> Yes. And for the short term I can >> imagine places we will settle with >> >> >>>> arrays even if it means tracking >> lengths unnecessarily and >> >> >>>> unsafeCoercing >> >> >>>> pointers whose types don't actually >> match their siblings. >> >> >>>> >> >> >>>> Is there anything to recommend the >> hacks mentioned for fixed sized >> >> >>>> array >> >> >>>> objects *other* than using them to >> fake structs? (Much to >> >> >>>> derecommend, as >> >> >>>> you mentioned!) >> >> >>>> >> >> >>>> On Fri, Aug 28, 2015 at 3:07 PM >> Edward Kmett > > >> >> >>>> wrote: >> >> >>>>> >> >> >>>>> I think both are useful, but the >> one you suggest requires a lot more >> >> >>>>> plumbing and doesn't subsume all >> of the usecases of the other. >> >> >>>>> >> >> >>>>> -Edward >> >> >>>>> >> >> >>>>> On Fri, Aug 28, 2015 at 5:51 PM, >> Ryan Newton > > >> >> >>>>> wrote: >> >> >>>>>> >> >> >>>>>> So that primitive is an array >> like thing (Same pointed type, >> >> >>>>>> unbounded >> >> >>>>>> length) with extra payload. >> >> >>>>>> >> >> >>>>>> I can see how we can do without >> structs if we have arrays, >> >> >>>>>> especially >> >> >>>>>> with the extra payload at front. >> But wouldn't the general solution >> >> >>>>>> for >> >> >>>>>> structs be one that that allows >> new user data type defs for # >> >> >>>>>> types? >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> On Fri, Aug 28, 2015 at 4:43 PM >> Edward Kmett > > >> >> >>>>>> wrote: >> >> >>>>>>> >> >> >>>>>>> Some form of MutableStruct# with >> a known number of words and a >> >> >>>>>>> known >> >> >>>>>>> number of pointers is basically >> what Ryan Yates was suggesting >> >> >>>>>>> above, but >> >> >>>>>>> where the word counts were >> stored in the objects themselves. >> >> >>>>>>> >> >> >>>>>>> Given that it'd have a couple of >> words for those counts it'd >> >> >>>>>>> likely >> >> >>>>>>> want to be something we build in >> addition to MutVar# rather than a >> >> >>>>>>> replacement. >> >> >>>>>>> >> >> >>>>>>> On the other hand, if we had to >> fix those numbers and build info >> >> >>>>>>> tables that knew them, and >> typechecker support, for instance, it'd >> >> >>>>>>> get >> >> >>>>>>> rather invasive. >> >> >>>>>>> >> >> >>>>>>> Also, a number of things that we >> can do with the 'sized' versions >> >> >>>>>>> above, like working with evil >> unsized c-style arrays directly >> >> >>>>>>> inline at the >> >> >>>>>>> end of the structure cease to be >> possible, so it isn't even a pure >> >> >>>>>>> win if we >> >> >>>>>>> did the engineering effort. >> >> >>>>>>> >> >> >>>>>>> I think 90% of the needs I have >> are covered just by adding the one >> >> >>>>>>> primitive. The last 10% gets >> pretty invasive. >> >> >>>>>>> >> >> >>>>>>> -Edward >> >> >>>>>>> >> >> >>>>>>> On Fri, Aug 28, 2015 at 5:30 PM, >> Ryan Newton > > >> >> >>>>>>> wrote: >> >> >>>>>>>> >> >> >>>>>>>> I like the possibility of a >> general solution for mutable structs >> >> >>>>>>>> (like Ed said), and I'm trying >> to fully understand why it's hard. >> >> >>>>>>>> >> >> >>>>>>>> So, we can't unpack MutVar into >> constructors because of object >> >> >>>>>>>> identity problems. But what >> about directly supporting an >> >> >>>>>>>> extensible set of >> >> >>>>>>>> unlifted MutStruct# objects, >> generalizing (and even replacing) >> >> >>>>>>>> MutVar#? That >> >> >>>>>>>> may be too much work, but is it >> problematic otherwise? >> >> >>>>>>>> >> >> >>>>>>>> Needless to say, this is also >> critical if we ever want best in >> >> >>>>>>>> class >> >> >>>>>>>> lockfree mutable structures, >> just like their Stm and sequential >> >> >>>>>>>> counterparts. >> >> >>>>>>>> >> >> >>>>>>>> On Fri, Aug 28, 2015 at 4:43 AM >> Simon Peyton Jones >> >> >>>>>>>> > > wrote: >> >> >>>>>>>>> >> >> >>>>>>>>> At the very least I'll take >> this email and turn it into a short >> >> >>>>>>>>> article. >> >> >>>>>>>>> >> >> >>>>>>>>> Yes, please do make it into a >> wiki page on the GHC Trac, and >> >> >>>>>>>>> maybe >> >> >>>>>>>>> make a ticket for it. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Thanks >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Simon >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> From: Edward Kmett >> [mailto:ekmett at gmail.com >> ] >> >> >>>>>>>>> Sent: 27 August 2015 16:54 >> >> >>>>>>>>> To: Simon Peyton Jones >> >> >>>>>>>>> Cc: Manuel M T Chakravarty; >> Simon Marlow; ghc-devs >> >> >>>>>>>>> Subject: Re: ArrayArrays >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> An ArrayArray# is just an >> Array# with a modified invariant. It >> >> >>>>>>>>> points directly to other >> unlifted ArrayArray#'s or ByteArray#'s. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> While those live in #, they >> are garbage collected objects, so >> >> >>>>>>>>> this >> >> >>>>>>>>> all lives on the heap. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> They were added to make some >> of the DPH stuff fast when it has >> >> >>>>>>>>> to >> >> >>>>>>>>> deal with nested arrays. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> I'm currently abusing them as >> a placeholder for a better thing. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> The Problem >> >> >>>>>>>>> >> >> >>>>>>>>> ----------------- >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Consider the scenario where >> you write a classic doubly-linked >> >> >>>>>>>>> list >> >> >>>>>>>>> in Haskell. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> data DLL = DLL (IORef (Maybe >> DLL) (IORef (Maybe DLL) >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Chasing from one DLL to the >> next requires following 3 pointers >> >> >>>>>>>>> on >> >> >>>>>>>>> the heap. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> DLL ~> IORef (Maybe DLL) ~> >> MutVar# RealWorld (Maybe DLL) ~> >> >> >>>>>>>>> Maybe >> >> >>>>>>>>> DLL ~> DLL >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> That is 3 levels of indirection. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> We can trim one by simply >> unpacking the IORef with >> >> >>>>>>>>> -funbox-strict-fields or UNPACK >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> We can trim another by adding >> a 'Nil' constructor for DLL and >> >> >>>>>>>>> worsening our representation. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> data DLL = DLL !(IORef DLL) >> !(IORef DLL) | Nil >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> but now we're still stuck with >> a level of indirection >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> DLL ~> MutVar# RealWorld DLL >> ~> DLL >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> This means that every >> operation we perform on this structure >> >> >>>>>>>>> will >> >> >>>>>>>>> be about half of the speed of >> an implementation in most other >> >> >>>>>>>>> languages >> >> >>>>>>>>> assuming we're memory bound on >> loading things into cache! >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Making Progress >> >> >>>>>>>>> >> >> >>>>>>>>> ---------------------- >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> I have been working on a >> number of data structures where the >> >> >>>>>>>>> indirection of going from >> something in * out to an object in # >> >> >>>>>>>>> which >> >> >>>>>>>>> contains the real pointer to >> my target and coming back >> >> >>>>>>>>> effectively doubles >> >> >>>>>>>>> my runtime. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> We go out to the MutVar# >> because we are allowed to put the >> >> >>>>>>>>> MutVar# >> >> >>>>>>>>> onto the mutable list when we >> dirty it. There is a well defined >> >> >>>>>>>>> write-barrier. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> I could change out the >> representation to use >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> data DLL = DLL (MutableArray# >> RealWorld DLL) | Nil >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> I can just store two pointers >> in the MutableArray# every time, >> >> >>>>>>>>> but >> >> >>>>>>>>> this doesn't help _much_ >> directly. It has reduced the amount of >> >> >>>>>>>>> distinct >> >> >>>>>>>>> addresses in memory I touch on >> a walk of the DLL from 3 per >> >> >>>>>>>>> object to 2. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> I still have to go out to the >> heap from my DLL and get to the >> >> >>>>>>>>> array >> >> >>>>>>>>> object and then chase it to >> the next DLL and chase that to the >> >> >>>>>>>>> next array. I >> >> >>>>>>>>> do get my two pointers >> together in memory though. I'm paying for >> >> >>>>>>>>> a card >> >> >>>>>>>>> marking table as well, which I >> don't particularly need with just >> >> >>>>>>>>> two >> >> >>>>>>>>> pointers, but we can shed that >> with the "SmallMutableArray#" >> >> >>>>>>>>> machinery added >> >> >>>>>>>>> back in 7.10, which is just >> the old array code a a new data >> >> >>>>>>>>> type, which can >> >> >>>>>>>>> speed things up a bit when you >> don't have very big arrays: >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> data DLL = DLL >> (SmallMutableArray# RealWorld DLL) | Nil >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> But what if I wanted my object >> itself to live in # and have two >> >> >>>>>>>>> mutable fields and be able to >> share the sme write barrier? >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> An ArrayArray# points directly >> to other unlifted array types. >> >> >>>>>>>>> What >> >> >>>>>>>>> if we have one # -> * wrapper >> on the outside to deal with the >> >> >>>>>>>>> impedence >> >> >>>>>>>>> mismatch between the >> imperative world and Haskell, and then just >> >> >>>>>>>>> let the >> >> >>>>>>>>> ArrayArray#'s hold other >> arrayarrays. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> data DLL = DLL >> (MutableArrayArray# RealWorld) >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> now I need to make up a new >> Nil, which I can just make be a >> >> >>>>>>>>> special >> >> >>>>>>>>> MutableArrayArray# I allocate >> on program startup. I can even >> >> >>>>>>>>> abuse pattern >> >> >>>>>>>>> synonyms. Alternately I can >> exploit the internals further to >> >> >>>>>>>>> make this >> >> >>>>>>>>> cheaper. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Then I can use the >> readMutableArrayArray# and >> >> >>>>>>>>> writeMutableArrayArray# calls >> to directly access the preceding >> >> >>>>>>>>> and next >> >> >>>>>>>>> entry in the linked list. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> So now we have one DLL wrapper >> which just 'bootstraps me' into a >> >> >>>>>>>>> strict world, and everything >> there lives in #. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> next :: DLL -> IO DLL >> >> >>>>>>>>> >> >> >>>>>>>>> next (DLL m) = IO $ \s -> case >> readMutableArrayArray# s of >> >> >>>>>>>>> >> >> >>>>>>>>> (# s', n #) -> (# s', DLL n >> #) >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> It turns out GHC is quite >> happy to optimize all of that code to >> >> >>>>>>>>> keep things unboxed. The 'DLL' >> wrappers get removed pretty >> >> >>>>>>>>> easily when they >> >> >>>>>>>>> are known strict and you chain > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From simonpj at microsoft.com Tue Sep 8 08:31:45 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Tue, 8 Sep 2015 08:31:45 +0000 Subject: Unpacking sum types In-Reply-To: <55EE93DC.7050409@gmail.com> References: <55EE93DC.7050409@gmail.com> Message-ID: | How did you envisage implementing anonymous boxed sums? What is their | heap representation? *Exactly* like tuples; that is, we have a family of data type declarations: data (a|b) = (_|) a | (|_) b data (a|b|c) = (_||) a | (|_|) b | (||_) c ..etc. Simon | | One option is to use some kind of generic object with a dynamic number | of pointers and non-pointers, and one field for the tag. The layout | would need to be stored in the object. This isn't a particularly | efficient representation, though. Perhaps there could be a family of | smaller specialised versions for common sizes. | | Do we have a use case for the boxed version, or is it just for | consistency? | | Cheers | Simon | | | > Looks good to me! | > | > Simon | > | > *From:*Johan Tibell [mailto:johan.tibell at gmail.com] | > *Sent:* 01 September 2015 18:24 | > *To:* Simon Peyton Jones; Simon Marlow; Ryan Newton | > *Cc:* ghc-devs at haskell.org | > *Subject:* RFC: Unpacking sum types | > | > I have a draft design for unpacking sum types that I'd like some | > feedback on. In particular feedback both on: | > | > * the writing and clarity of the proposal and | > | > * the proposal itself. | > | > https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes | > | > -- Johan | > From marlowsd at gmail.com Tue Sep 8 08:54:36 2015 From: marlowsd at gmail.com (Simon Marlow) Date: Tue, 8 Sep 2015 09:54:36 +0100 Subject: Unpacking sum types In-Reply-To: References: <55EE93DC.7050409@gmail.com> Message-ID: <55EEA24C.9080504@gmail.com> On 08/09/2015 09:31, Simon Peyton Jones wrote: > | How did you envisage implementing anonymous boxed sums? What is their > | heap representation? > > *Exactly* like tuples; that is, we have a family of data type declarations: > > data (a|b) = (_|) a > | (|_) b > > data (a|b|c) = (_||) a > | (|_|) b > | (||_) c > ..etc. I see, but then you can't have multiple fields, like ( (# Int,Bool #) |) You'd have to box the inner tuple too. Ok, I suppose. Cheers Simon > Simon > > | > | One option is to use some kind of generic object with a dynamic number > | of pointers and non-pointers, and one field for the tag. The layout > | would need to be stored in the object. This isn't a particularly > | efficient representation, though. Perhaps there could be a family of > | smaller specialised versions for common sizes. > | > | Do we have a use case for the boxed version, or is it just for > | consistency? > | > | Cheers > | Simon > | > | > | > Looks good to me! > | > > | > Simon > | > > | > *From:*Johan Tibell [mailto:johan.tibell at gmail.com] > | > *Sent:* 01 September 2015 18:24 > | > *To:* Simon Peyton Jones; Simon Marlow; Ryan Newton > | > *Cc:* ghc-devs at haskell.org > | > *Subject:* RFC: Unpacking sum types > | > > | > I have a draft design for unpacking sum types that I'd like some > | > feedback on. In particular feedback both on: > | > > | > * the writing and clarity of the proposal and > | > > | > * the proposal itself. > | > > | > https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes > | > > | > -- Johan > | > > From marlowsd at gmail.com Tue Sep 8 08:54:48 2015 From: marlowsd at gmail.com (Simon Marlow) Date: Tue, 8 Sep 2015 09:54:48 +0100 Subject: Unpacking sum types In-Reply-To: References: <55EE93DC.7050409@gmail.com> Message-ID: <55EEA258.2030200@gmail.com> On 08/09/2015 09:31, Simon Peyton Jones wrote: > | How did you envisage implementing anonymous boxed sums? What is their > | heap representation? > > *Exactly* like tuples; that is, we have a family of data type declarations: > > data (a|b) = (_|) a > | (|_) b > > data (a|b|c) = (_||) a > | (|_|) b > | (||_) c > ..etc. I see, but then you can't have multiple fields, like ( (# Int,Bool #) |) You'd have to box the inner tuple too. Ok, I suppose. Cheers Simon > Simon > > | > | One option is to use some kind of generic object with a dynamic number > | of pointers and non-pointers, and one field for the tag. The layout > | would need to be stored in the object. This isn't a particularly > | efficient representation, though. Perhaps there could be a family of > | smaller specialised versions for common sizes. > | > | Do we have a use case for the boxed version, or is it just for > | consistency? > | > | Cheers > | Simon > | > | > | > Looks good to me! > | > > | > Simon > | > > | > *From:*Johan Tibell [mailto:johan.tibell at gmail.com] > | > *Sent:* 01 September 2015 18:24 > | > *To:* Simon Peyton Jones; Simon Marlow; Ryan Newton > | > *Cc:* ghc-devs at haskell.org > | > *Subject:* RFC: Unpacking sum types > | > > | > I have a draft design for unpacking sum types that I'd like some > | > feedback on. In particular feedback both on: > | > > | > * the writing and clarity of the proposal and > | > > | > * the proposal itself. > | > > | > https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes > | > > | > -- Johan > | > > From marlowsd at gmail.com Tue Sep 8 08:56:46 2015 From: marlowsd at gmail.com (Simon Marlow) Date: Tue, 8 Sep 2015 09:56:46 +0100 Subject: ArrayArrays In-Reply-To: References: <325b043066bb48a79f254b75ba9753ee@DB4PR30MB030.064d.mgd.msft.net> <55EE90ED.1040609@gmail.com> Message-ID: <55EEA2CE.6020606@gmail.com> On 08/09/2015 09:29, Edward Kmett wrote: > Once you start to include all the other primitive types there is a bit > more of an explosion. MVar#, TVar#, MutVar#, Small variants, etc. can > all be modified to carry unlifted content. Yep, that's a fair point. Cheers Simon > Being able to be parametric over that choice would permit a number of > things in user land to do the same thing with an open-ended set of > design possibilities that are rather hard to contemplate in advance. > e.g. being able to abstract over them could let you just use a normal > (,) to carry around unlifted parametric data types or being able to talk > about [MVar# s a] drastically reducing the number of one off data types > we need to invent. > > If you can talk about the machinery mentioned above then you can have > typeclasses parameterized on an argument that could be either unlifted > or lifted. > > I'm not willing to fight too hard for it, but it feels more like the > "right" solution than retaining a cut-and-paste copy of the same code > and bifurcating further on each argument you want to consider such a > degree of freedom. > > As such it seems like a pretty big win for a comparatively minor change > to the levity polymorphism machinery. > > -Edward > > On Tue, Sep 8, 2015 at 3:40 AM, Simon Marlow > wrote: > > This would be very cool, however it's questionable whether it's > worth it. > > Without any unlifted kind, we need > - ArrayArray# > - a set of new/read/write primops for every element type, > either built-in or made from unsafeCoerce# > > With the unlifted kind, we would need > - ArrayArray# > - one set of new/read/write primops > > With levity polymorphism, we would need > - none of this, Array# can be used > > So having an unlifted kind already kills a lot of the duplication, > polymorphism only kills a bit more. > > Cheers > Simon > > On 08/09/2015 00:14, Edward Kmett wrote: > > Assume we had the ability to talk about Levity in a new way and > instead > of just: > > data Levity = Lifted | Unlifted > > type * = TYPE 'Lifted > type # = TYPE 'Unlifted > > we replace had a more nuanced notion of TYPE parameterized on > another > data type: > > data Levity = Lifted | Unlifted > data Param = Composite | Simple Levity > > and we parameterized TYPE with a Param rather than Levity. > > Existing strange representations can continue to live in TYPE > 'Composite > > (# Int# , Double #) :: TYPE 'Composite > > and we don't support parametricity in there, just like, currently we > don't allow parametricity in #. > > We can include the undefined example from Richard's talk: > > undefined :: forall (v :: Param). v > > and ultimately lift it into his pi type when it is available > just as before. > > But we could let consider TYPE ('Simple 'Unlifted) as a form of > 'parametric #' covering unlifted things we're willing to allow > polymorphism over because they are just pointers to something in the > heap, that just happens to not be able to be _|_ or a thunk. > > In this setting, recalling that above, I modified Richard's TYPE > to take > a Param instead of Levity, we can define a type alias for things > that > live as a simple pointer to a heap allocated object: > > type GC (l :: Levity) = TYPE ('Simple l) > type * = GC 'Lifted > > and then we can look at existing primitives generalized: > > Array# :: forall (l :: Levity) (a :: GC l). a -> GC 'Unlifted > MutableArray# :: forall (l :: Levity) (a :: GC l). * -> a -> GC > 'Unlifted > SmallArray# :: forall (l :: Levity) (a :: GC l). a -> GC 'Unlifted > SmallMutableArray# :: forall (l :: Levity) (a :: GC l). * -> a -> GC > 'Unlifted > MutVar# :: forall (l :: Levity) (a :: GC l). * -> a -> GC 'Unlifted > MVar# :: forall (l :: Levity) (a :: GC l). * -> a -> GC 'Unlifted > > Weak#, StablePtr#, StableName#, etc. all can take similar > modifications. > > Recall that an ArrayArray# was just an Array# hacked up to be > able to > hold onto the subset of # that is collectable. > > Almost all of the operations on these data types can work on the > more > general kind of argument. > > newArray# :: forall (s :: *) (l :: Levity) (a :: GC l). Int# -> a -> > State# s -> (# State# s, MutableArray# s a #) > > writeArray# :: forall (s :: *) (l :: Levity) (a :: GC l). > MutableArray# > s a -> Int# -> a -> State# s -> State# s > > readArray# :: forall (s :: *) (l :: Levity) (a :: GC l). > MutableArray# s > a -> Int# -> State# s -> (# State# s, a #) > > etc. > > Only a couple of our existing primitives _can't_ generalize this > way. > The one that leaps to mind is atomicModifyMutVar, which would > need to > stay constrained to only work on arguments in *, because of the > way it > operates. > > With that we can still talk about > > MutableArray# s Int > > but now we can also talk about: > > MutableArray# s (MutableArray# s Int) > > without the layer of indirection through a box in * and without an > explosion of primops. The same newFoo, readFoo, writeFoo > machinery works > for both kinds. > > The struct machinery doesn't get to take advantage of this, but > it would > let us clean house elsewhere in Prim and drastically improve the > range > of applicability of the existing primitives with nothing more than a > small change to the levity machinery. > > I'm not attached to any of the names above, I coined them just > to give > us a concrete thing to talk about. > > Here I'm only proposing we extend machinery in GHC.Prim this > way, but an > interesting 'now that the barn door is open' question is to consider > that our existing Haskell data types often admit a similar form of > parametricity and nothing in principle prevents this from > working for > Maybe or [] and once you permit inference to fire across all of GC l > then it seems to me that you'd start to get those same capabilities > there as well when LevityPolymorphism was turned on. > > -Edward > > On Mon, Sep 7, 2015 at 5:56 PM, Simon Peyton Jones > > >> > wrote: > > This could make the menagerie of ways to pack > {Small}{Mutable}Array{Array}# references into a > {Small}{Mutable}Array{Array}#' actually typecheck soundly, > reducing > the need for folks to descend into the use of the more evil > structure primitives we're talking about, and letting us > keep a few > more principles around us.____ > > __ __ > > I?m lost. Can you give some concrete examples that > illustrate how > levity polymorphism will help us?____ > > > Simon____ > > __ __ > > *From:*Edward Kmett [mailto:ekmett at gmail.com > >] > *Sent:* 07 September 2015 21:17 > *To:* Simon Peyton Jones > *Cc:* Ryan Newton; Johan Tibell; Simon Marlow; Manuel M T > Chakravarty; Chao-Hong Chen; ghc-devs; Ryan Scott; Ryan Yates > *Subject:* Re: ArrayArrays____ > > __ __ > > I had a brief discussion with Richard during the Haskell > Symposium > about how we might be able to let parametricity help a bit in > reducing the space of necessarily primops to a slightly more > manageable level. ____ > > __ __ > > Notably, it'd be interesting to explore the ability to allow > parametricity over the portion of # that is just a gcptr.____ > > __ __ > > We could do this if the levity polymorphism machinery was > tweaked a > bit. You could envision the ability to abstract over things > in both > * and the subset of # that are represented by a gcptr, then > modifying the existing array primitives to be parametric in > that > choice of levity for their argument so long as it was of a > "heap > object" levity.____ > > __ __ > > This could make the menagerie of ways to pack > {Small}{Mutable}Array{Array}# references into a > {Small}{Mutable}Array{Array}#' actually typecheck soundly, > reducing > the need for folks to descend into the use of the more evil > structure primitives we're talking about, and letting us > keep a few > more principles around us.____ > > __ __ > > Then in the cases like `atomicModifyMutVar#` where it needs to > actually be in * rather than just a gcptr, due to the > constructed > field selectors it introduces on the heap then we could > keep the > existing less polymorphic type.____ > > __ __ > > -Edward____ > > __ __ > > On Mon, Sep 7, 2015 at 9:59 AM, Simon Peyton Jones > > >> > wrote:____ > > It was fun to meet and discuss this.____ > > ____ > > Did someone volunteer to write a wiki page that > describes the > proposed design? And, I earnestly hope, also describes the > menagerie of currently available array types and > primops so that > users can have some chance of picking the right one?!____ > > ____ > > Thanks____ > > ____ > > Simon____ > > ____ > > *From:*ghc-devs [mailto:ghc-devs-bounces at haskell.org > > >] *On Behalf Of *Ryan Newton > *Sent:* 31 August 2015 23:11 > *To:* Edward Kmett; Johan Tibell > *Cc:* Simon Marlow; Manuel M T Chakravarty; Chao-Hong Chen; > ghc-devs; Ryan Scott; Ryan Yates > *Subject:* Re: ArrayArrays____ > > ____ > > Dear Edward, Ryan Yates, and other interested parties > -- ____ > > ____ > > So when should we meet up about this?____ > > ____ > > May I propose the Tues afternoon break for everyone at > ICFP who > is interested in this topic? We can meet out in the > coffee area > and congregate around Edward Kmett, who is tall and > should be > easy to find ;-).____ > > ____ > > I think Ryan is going to show us how to use his new > primops for > combined array + other fields in one heap object?____ > > ____ > > On Sat, Aug 29, 2015 at 9:24 PM Edward Kmett > > >> > wrote:____ > > Without a custom primitive it doesn't help much > there, you > have to store the indirection to the mask.____ > > ____ > > With a custom primitive it should cut the on heap > root-to-leaf path of everything in the HAMT in half. A > shorter HashMap was actually one of the motivating > factors > for me doing this. It is rather astoundingly > difficult to > beat the performance of HashMap, so I had to start > cheating > pretty badly. ;)____ > > ____ > > -Edward____ > > ____ > > On Sat, Aug 29, 2015 at 5:45 PM, Johan Tibell > >> > wrote:____ > > I'd also be interested to chat at ICFP to see > if I can > use this for my HAMT implementation.____ > > ____ > > On Sat, Aug 29, 2015 at 3:07 PM, Edward Kmett > > >> wrote:____ > > Sounds good to me. Right now I'm just > hacking up > composable accessors for "typed slots" in a > fairly > lens-like fashion, and treating the set of > slots I > define and the 'new' function I build for > the data > type as its API, and build atop that. This > could > eventually graduate to template-haskell, > but I'm not > entirely satisfied with the solution I have. I > currently distinguish between what I'm calling > "slots" (things that point directly to another > SmallMutableArrayArray# sans wrapper) and > "fields" > which point directly to the usual Haskell > data types > because unifying the two notions meant that I > couldn't lift some coercions out "far > enough" to > make them vanish.____ > > ____ > > I'll be happy to run through my current > working set > of issues in person and -- as things get > nailed down > further -- in a longer lived medium than in > personal > conversations. ;)____ > > ____ > > -Edward____ > > ____ > > On Sat, Aug 29, 2015 at 7:59 AM, Ryan Newton > >> > wrote:____ > > I'd also love to meet up at ICFP and > discuss > this. I think the array primops plus a > TH layer > that lets (ab)use them many times > without too > much marginal cost sounds great. And > I'd like > to learn how we could be either early > users of, > or help with, this infrastructure.____ > > ____ > > CC'ing in Ryan Scot and Omer Agacan who > may also > be interested in dropping in on such > discussions > @ICFP, and Chao-Hong Chen, a Ph.D. > student who > is currently working on concurrent data > structures in Haskell, but will not be > at ICFP.____ > > ____ > > ____ > > On Fri, Aug 28, 2015 at 7:47 PM, Ryan Yates > > >> wrote:____ > > I completely agree. I would love > to spend > some time during ICFP and > friends talking about what it could > look > like. My small array for STM > changes for the RTS can be seen > here [1]. > It is on a branch somewhere > between 7.8 and 7.10 and includes > irrelevant > STM bits and some > confusing naming choices (sorry), > but should > cover all the details > needed to implement it for a non-STM > context. The biggest surprise > for me was following small array > too closely > and having a word/byte > offset miss-match [2]. > > [1]: > https://github.com/fryguybob/ghc/compare/ghc-htm-bloom...fryguybob:ghc-htm-mut > [2]: > https://ghc.haskell.org/trac/ghc/ticket/10413 > > Ryan____ > > > On Fri, Aug 28, 2015 at 10:09 PM, > Edward > Kmett > >> wrote: > > I'd love to have that last 10%, > but its a > lot of work to get there and more > > importantly I don't know quite > what it > should look like. > > > > On the other hand, I do have a > pretty > good idea of how the primitives above > > could be banged out and tested > in a long > evening, well in time for 7.12. And > > as noted earlier, those remain > useful > even if a nicer typed version with an > > extra level of indirection to > the sizes > is built up after. > > > > The rest sounds like a good graduate > student project for someone who has > > graduate students lying around. > Maybe > somebody at Indiana University who has > > an interest in type theory and > parallelism can find us one. =) > > > > -Edward > > > > On Fri, Aug 28, 2015 at 8:48 PM, > Ryan > Yates > >> wrote: > >> > >> I think from my perspective, the > motivation for getting the type > >> checker involved is primarily > bringing > this to the level where users > >> could be expected to build these > structures. it is reasonable to > >> think that there are people who > want to > use STM (a context with > >> mutation already) to implement a > straight forward data structure that > >> avoids extra indirection > penalty. There > should be some places where > >> knowing that things are field > accesses > rather then array indexing > >> could be helpful, but I think > GHC is > good right now about handling > >> constant offsets. In my code I > don't do > any bounds checking as I know > >> I will only be accessing my > arrays with > constant indexes. I make > >> wrappers for each field access > and leave > all the unsafe stuff in > >> there. When things go wrong > though, the > compiler is no help. Maybe > >> template Haskell that generates the > appropriate wrappers is the right > >> direction to go. > >> There is another benefit for me > when > working with these as arrays in > >> that it is quite simple and direct > (given the hoops already jumped > >> through) to play with > alignment. I can > ensure two pointers are never > >> on the same cache-line by just > spacing > things out in the array. > >> > >> On Fri, Aug 28, 2015 at 7:33 > PM, Edward > Kmett > >> wrote: > >> > They just segfault at this > level. ;) > >> > > >> > Sent from my iPhone > >> > > >> > On Aug 28, 2015, at 7:25 PM, Ryan > Newton > >> wrote: > >> > > >> > You presumably also save a bounds > check on reads by hard-coding the > >> > sizes? > >> > > >> > On Fri, Aug 28, 2015 at 3:39 PM, > Edward Kmett > >> wrote: > >> >> > >> >> Also there are 4 different > "things" > here, basically depending on two > >> >> independent questions: > >> >> > >> >> a.) if you want to shove the > sizes > into the info table, and > >> >> b.) if you want cardmarking. > >> >> > >> >> Versions with/without > cardmarking for > different sizes can be done > >> >> pretty > >> >> easily, but as noted, the > infotable > variants are pretty invasive. > >> >> > >> >> -Edward > >> >> > >> >> On Fri, Aug 28, 2015 at 6:36 PM, > Edward Kmett > >> wrote: > >> >>> > >> >>> Well, on the plus side > you'd save 16 > bytes per object, which adds up > >> >>> if > >> >>> they were small enough and > there are > enough of them. You get a bit > >> >>> better > >> >>> locality of reference in > terms of > what fits in the first cache line of > >> >>> them. > >> >>> > >> >>> -Edward > >> >>> > >> >>> On Fri, Aug 28, 2015 at > 6:14 PM, > Ryan Newton > >> > >> >>> wrote: > >> >>>> > >> >>>> Yes. And for the short > term I can > imagine places we will settle with > >> >>>> arrays even if it means > tracking > lengths unnecessarily and > >> >>>> unsafeCoercing > >> >>>> pointers whose types don't > actually > match their siblings. > >> >>>> > >> >>>> Is there anything to > recommend the > hacks mentioned for fixed sized > >> >>>> array > >> >>>> objects *other* than using > them to > fake structs? (Much to > >> >>>> derecommend, as > >> >>>> you mentioned!) > >> >>>> > >> >>>> On Fri, Aug 28, 2015 at > 3:07 PM > Edward Kmett > >> > >> >>>> wrote: > >> >>>>> > >> >>>>> I think both are useful, > but the > one you suggest requires a lot more > >> >>>>> plumbing and doesn't > subsume all > of the usecases of the other. > >> >>>>> > >> >>>>> -Edward > >> >>>>> > >> >>>>> On Fri, Aug 28, 2015 at > 5:51 PM, > Ryan Newton > >> > >> >>>>> wrote: > >> >>>>>> > >> >>>>>> So that primitive is an > array > like thing (Same pointed type, > >> >>>>>> unbounded > >> >>>>>> length) with extra payload. > >> >>>>>> > >> >>>>>> I can see how we can do > without > structs if we have arrays, > >> >>>>>> especially > >> >>>>>> with the extra payload > at front. > But wouldn't the general solution > >> >>>>>> for > >> >>>>>> structs be one that that > allows > new user data type defs for # > >> >>>>>> types? > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> On Fri, Aug 28, 2015 at > 4:43 PM > Edward Kmett > >> > >> >>>>>> wrote: > >> >>>>>>> > >> >>>>>>> Some form of > MutableStruct# with > a known number of words and a > >> >>>>>>> known > >> >>>>>>> number of pointers is > basically > what Ryan Yates was suggesting > >> >>>>>>> above, but > >> >>>>>>> where the word counts were > stored in the objects themselves. > >> >>>>>>> > >> >>>>>>> Given that it'd have a > couple of > words for those counts it'd > >> >>>>>>> likely > >> >>>>>>> want to be something we > build in > addition to MutVar# rather than a > >> >>>>>>> replacement. > >> >>>>>>> > >> >>>>>>> On the other hand, if > we had to > fix those numbers and build info > >> >>>>>>> tables that knew them, and > typechecker support, for instance, it'd > >> >>>>>>> get > >> >>>>>>> rather invasive. > >> >>>>>>> > >> >>>>>>> Also, a number of > things that we > can do with the 'sized' versions > >> >>>>>>> above, like working > with evil > unsized c-style arrays directly > >> >>>>>>> inline at the > >> >>>>>>> end of the structure > cease to be > possible, so it isn't even a pure > >> >>>>>>> win if we > >> >>>>>>> did the engineering effort. > >> >>>>>>> > >> >>>>>>> I think 90% of the > needs I have > are covered just by adding the one > >> >>>>>>> primitive. The last 10% > gets > pretty invasive. > >> >>>>>>> > >> >>>>>>> -Edward > >> >>>>>>> > >> >>>>>>> On Fri, Aug 28, 2015 at > 5:30 PM, > Ryan Newton > >> > >> >>>>>>> wrote: > >> >>>>>>>> > >> >>>>>>>> I like the possibility > of a > general solution for mutable structs > >> >>>>>>>> (like Ed said), and > I'm trying > to fully understand why it's hard. > >> >>>>>>>> > >> >>>>>>>> So, we can't unpack > MutVar into > constructors because of object > >> >>>>>>>> identity problems. But > what > about directly supporting an > >> >>>>>>>> extensible set of > >> >>>>>>>> unlifted MutStruct# > objects, > generalizing (and even replacing) > >> >>>>>>>> MutVar#? That > >> >>>>>>>> may be too much work, > but is it > problematic otherwise? > >> >>>>>>>> > >> >>>>>>>> Needless to say, this > is also > critical if we ever want best in > >> >>>>>>>> class > >> >>>>>>>> lockfree mutable > structures, > just like their Stm and sequential > >> >>>>>>>> counterparts. > >> >>>>>>>> > >> >>>>>>>> On Fri, Aug 28, 2015 > at 4:43 AM > Simon Peyton Jones > >> >>>>>>>> > >> wrote: > >> >>>>>>>>> > >> >>>>>>>>> At the very least > I'll take > this email and turn it into a short > >> >>>>>>>>> article. > >> >>>>>>>>> > >> >>>>>>>>> Yes, please do make > it into a > wiki page on the GHC Trac, and > >> >>>>>>>>> maybe > >> >>>>>>>>> make a ticket for it. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Thanks > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Simon > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> From: Edward Kmett > [mailto:ekmett at gmail.com > > >] > >> >>>>>>>>> Sent: 27 August 2015 > 16:54 > >> >>>>>>>>> To: Simon Peyton Jones > >> >>>>>>>>> Cc: Manuel M T > Chakravarty; > Simon Marlow; ghc-devs > >> >>>>>>>>> Subject: Re: ArrayArrays > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> An ArrayArray# is just an > Array# with a modified invariant. It > >> >>>>>>>>> points directly to other > unlifted ArrayArray#'s or ByteArray#'s. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> While those live in > #, they > are garbage collected objects, so > >> >>>>>>>>> this > >> >>>>>>>>> all lives on the heap. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> They were added to > make some > of the DPH stuff fast when it has > >> >>>>>>>>> to > >> >>>>>>>>> deal with nested arrays. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I'm currently abusing > them as > a placeholder for a better thing. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> The Problem > >> >>>>>>>>> > >> >>>>>>>>> ----------------- > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Consider the scenario > where > you write a classic doubly-linked > >> >>>>>>>>> list > >> >>>>>>>>> in Haskell. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL (IORef > (Maybe > DLL) (IORef (Maybe DLL) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Chasing from one DLL > to the > next requires following 3 pointers > >> >>>>>>>>> on > >> >>>>>>>>> the heap. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> DLL ~> IORef (Maybe > DLL) ~> > MutVar# RealWorld (Maybe DLL) ~> > >> >>>>>>>>> Maybe > >> >>>>>>>>> DLL ~> DLL > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> That is 3 levels of > indirection. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> We can trim one by simply > unpacking the IORef with > >> >>>>>>>>> -funbox-strict-fields > or UNPACK > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> We can trim another > by adding > a 'Nil' constructor for DLL and > >> >>>>>>>>> worsening our > representation. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL > !(IORef DLL) > !(IORef DLL) | Nil > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> but now we're still > stuck with > a level of indirection > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> DLL ~> MutVar# > RealWorld DLL > ~> DLL > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> This means that every > operation we perform on this structure > >> >>>>>>>>> will > >> >>>>>>>>> be about half of the > speed of > an implementation in most other > >> >>>>>>>>> languages > >> >>>>>>>>> assuming we're memory > bound on > loading things into cache! > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Making Progress > >> >>>>>>>>> > >> >>>>>>>>> ---------------------- > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I have been working on a > number of data structures where the > >> >>>>>>>>> indirection of going from > something in * out to an object in # > >> >>>>>>>>> which > >> >>>>>>>>> contains the real > pointer to > my target and coming back > >> >>>>>>>>> effectively doubles > >> >>>>>>>>> my runtime. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> We go out to the MutVar# > because we are allowed to put the > >> >>>>>>>>> MutVar# > >> >>>>>>>>> onto the mutable list > when we > dirty it. There is a well defined > >> >>>>>>>>> write-barrier. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I could change out the > representation to use > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL > (MutableArray# > RealWorld DLL) | Nil > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I can just store two > pointers > in the MutableArray# every time, > >> >>>>>>>>> but > >> >>>>>>>>> this doesn't help _much_ > directly. It has reduced the amount of > >> >>>>>>>>> distinct > >> >>>>>>>>> addresses in memory I > touch on > a walk of the DLL from 3 per > >> >>>>>>>>> object to 2. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I still have to go > out to the > heap from my DLL and get to the > >> >>>>>>>>> array > >> >>>>>>>>> object and then chase > it to > the next DLL and chase that to the > >> >>>>>>>>> next array. I > >> >>>>>>>>> do get my two pointers > together in memory though. I'm > paying for > >> >>>>>>>>> a card > >> >>>>>>>>> marking table as > well, which I > don't particularly need with just > >> >>>>>>>>> two > >> >>>>>>>>> pointers, but we can > shed that > with the "SmallMutableArray#" > >> >>>>>>>>> machinery added > >> >>>>>>>>> back in 7.10, which > is just > the old array code a a new data > >> >>>>>>>>> type, which can > >> >>>>>>>>> speed things up a bit > when you > don't have very big arrays: > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL > (SmallMutableArray# RealWorld DLL) > | Nil > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> But what if I wanted > my object > itself to live in # and have two > >> >>>>>>>>> mutable fields and be > able to > share the sme write barrier? > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> An ArrayArray# points > directly > to other unlifted array types. > >> >>>>>>>>> What > >> >>>>>>>>> if we have one # -> * > wrapper > on the outside to deal with the > >> >>>>>>>>> impedence > >> >>>>>>>>> mismatch between the > imperative world and Haskell, and > then just > >> >>>>>>>>> let the > >> >>>>>>>>> ArrayArray#'s hold other > arrayarrays. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> data DLL = DLL > (MutableArrayArray# RealWorld) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> now I need to make up > a new > Nil, which I can just make be a > >> >>>>>>>>> special > >> >>>>>>>>> MutableArrayArray# I > allocate > on program startup. I can even > >> >>>>>>>>> abuse pattern > >> >>>>>>>>> synonyms. Alternately > I can > exploit the internals further to > >> >>>>>>>>> make this > >> >>>>>>>>> cheaper. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> Then I can use the > readMutableArrayArray# and > >> >>>>>>>>> > writeMutableArrayArray# calls > to directly access the preceding > >> >>>>>>>>> and next > >> >>>>>>>>> entry in the linked list. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> So now we have one > DLL wrapper > which just 'bootstraps me' into a > >> >>>>>>>>> strict world, and > everything > there lives in #. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> next :: DLL -> IO DLL > >> >>>>>>>>> > >> >>>>>>>>> next (DLL m) = IO $ > \s -> case > readMutableArrayArray# s of > >> >>>>>>>>> > >> >>>>>>>>> (# s', n #) -> (# > s', DLL n #) > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> It turns out GHC is quite > happy to optimize all of that code to > >> >>>>>>>>> keep things unboxed. > The 'DLL' > wrappers get removed pretty > >> >>>>>>>>> easily when they > >> >>>>>>>>> are known strict and > you chain > > From kane at kane.cx Tue Sep 8 10:12:33 2015 From: kane at kane.cx (David Kraeutmann) Date: Tue, 8 Sep 2015 12:12:33 +0200 Subject: AnonymousSums data con syntax In-Reply-To: References: <9eb2c9041f6142ce947a4b323c0b2bff@DB4PR30MB030.064d.mgd.msft.net> <1441657274.28403.7.camel@joachim-breitner.de> Message-ID: For what's it worth, I feel like (|True|||) looks better than (2/5|True) or (2 of 5|True). Not sure if the confusion w/r/t (x||) as or section or 3-ary anonymous sum is worth it though. On Tue, Sep 8, 2015 at 10:28 AM, Simon Peyton Jones wrote: > I can see the force of this discussion about data type constructors for > sums, but > > ? We already do this for tuples: (,,,,) is a type constructor and > you have to count commas. We could use a number here but we don?t. > > ? Likewise tuple sections. (,,e,) means (\xyz. (x,y,e,z)) > > I do not expect big sums in practice. > > > > That said, (2/5| True) instead of (|True|||) would be ok I suppose. Or > something like that. > > > > Simon > > > > From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Lennart > Kolmodin > Sent: 08 September 2015 07:12 > To: Joachim Breitner > Cc: ghc-devs at haskell.org > Subject: Re: AnonymousSums data con syntax > > > > > > > > 2015-09-07 21:21 GMT+01:00 Joachim Breitner : > > Hi, > > Am Montag, den 07.09.2015, 19:25 +0000 schrieb Simon Peyton Jones: >> > Are we okay with stealing some operator sections for this? E.G. (x >> > > > ). I think the boxed sums larger than 2 choices are all technically >> > > > overlapping with sections. >> >> I hadn't thought of that. I suppose that in distfix notation we >> could require spaces >> (x | |) >> since vertical bar by itself isn't an operator. But then (_||) x >> might feel more compact. >> >> Also a section (x ||) isn't valid in a pattern, so we would not need >> to require spaces there. >> >> But my gut feel is: yes, with AnonymousSums we should just steal the >> syntax. It won't hurt existing code (since it won't use >> AnonymousSums), and if you *are* using AnonymousSums then the distfix >> notation is probably more valuable than the sections for an operator >> you probably aren't using. > > I wonder if this syntax for constructors is really that great. Yes, you > there is similarly with the type constructor (which is nice), but for > the data constructor, do we really want an unary encoding and have our > users count bars? > > I believe the user (and also us, having to read core) would be better > served by some syntax that involves plain numbers. > > > > I reacted the same way to the proposed syntax. > > Imagine already having an anonymous sum type and then deciding adding > another constructor. Naturally you'd have to update your code to handle the > new constructor, but you also need to update the code for all other > constructors as well by adding another bar in the right place. That seems > unnecessary and there's no need to do that for named sum types. > > > > What about explicitly stating the index as a number? > > > > (1 | Int) :: ( String | Int | Bool ) > > (#1 | Int #) :: (# String | Int | Bool #) > > > > case sum of > > (0 | myString ) -> ... > > (1 | myInt ) -> ... > > (2 | myBool ) -> ... > > > > This allows you to at least add new constructors at the end without changing > existing code. > > Is it harder to resolve by type inference since we're not stating the number > of constructors? If so we could do something similar to Joachim's proposal; > > > > case sum of > > (0 of 3 | myString ) -> ... > > (1 of 3 | myInt ) -> ... > > (2 of 3 | myBool ) -> ... > > > > .. and at least you don't have to count bars. > > > > > Given that of is already a keyword, how about something involving "3 > of 4"? For example > > (Put# True in 3 of 5) :: (# a | b | Bool | d | e #) > > and > > case sum of > (Put# x in 1 of 3) -> ... > (Put# x in 2 of 3) -> ... > (Put# x in 3 of 3) -> ... > > (If "as" were a keyword, (Put# x as 2 of 3) would sound even better.) > > > I don?t find this particular choice very great, but something with > numbers rather than ASCII art seems to make more sense here. Is there > something even better? > > Greetings, > Joachim > > > > > -- > Joachim ?nomeata? Breitner > mail at joachim-breitner.de ? http://www.joachim-breitner.de/ > Jabber: nomeata at joachim-breitner.de ? GPG-Key: 0xF0FBF51F > Debian Developer: nomeata at debian.org > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > > > > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > From simonpj at microsoft.com Tue Sep 8 11:48:21 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Tue, 8 Sep 2015 11:48:21 +0000 Subject: Unpacking sum types In-Reply-To: <55EEA24C.9080504@gmail.com> References: <55EE93DC.7050409@gmail.com> <55EEA24C.9080504@gmail.com> Message-ID: | I see, but then you can't have multiple fields, like | | ( (# Int,Bool #) |) | | You'd have to box the inner tuple too. Ok, I suppose. Well of course! It's just a parameterised data type, like a tuple. But, just like unboxed tuples, you could have an unboxed tuple (or sum) inside an unboxed tuple. (# (# Int,Bool #) | Int #) Simon | -----Original Message----- | From: Simon Marlow [mailto:marlowsd at gmail.com] | Sent: 08 September 2015 09:55 | To: Simon Peyton Jones; Johan Tibell; Ryan Newton | Cc: ghc-devs at haskell.org | Subject: Re: Unpacking sum types | | On 08/09/2015 09:31, Simon Peyton Jones wrote: | > | How did you envisage implementing anonymous boxed sums? What is | > | their heap representation? | > | > *Exactly* like tuples; that is, we have a family of data type | declarations: | > | > data (a|b) = (_|) a | > | (|_) b | > | > data (a|b|c) = (_||) a | > | (|_|) b | > | (||_) c | > ..etc. | | I see, but then you can't have multiple fields, like | | ( (# Int,Bool #) |) | | You'd have to box the inner tuple too. Ok, I suppose. | | Cheers | Simon | | | > Simon | > | > | | > | One option is to use some kind of generic object with a dynamic | > | number of pointers and non-pointers, and one field for the tag. | > | The layout would need to be stored in the object. This isn't a | > | particularly efficient representation, though. Perhaps there | could | > | be a family of smaller specialised versions for common sizes. | > | | > | Do we have a use case for the boxed version, or is it just for | > | consistency? | > | | > | Cheers | > | Simon | > | | > | | > | > Looks good to me! | > | > | > | > Simon | > | > | > | > *From:*Johan Tibell [mailto:johan.tibell at gmail.com] > *Sent:* | 01 | > | September 2015 18:24 > *To:* Simon Peyton Jones; Simon Marlow; | Ryan | > | Newton > *Cc:* ghc-devs at haskell.org > *Subject:* RFC: Unpacking | > | sum types > > I have a draft design for unpacking sum types that | > | I'd like some > feedback on. In particular feedback both on: | > | > | > | > * the writing and clarity of the proposal and | > | > | > | > * the proposal itself. | > | > | > | > https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes | > | > | > | > -- Johan | > | > | > From simonpj at microsoft.com Tue Sep 8 11:50:27 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Tue, 8 Sep 2015 11:50:27 +0000 Subject: ArrayArrays In-Reply-To: References: <2FCB6298-A4FF-4F7B-8BF8-4880BB3154AB@gmail.com> <325b043066bb48a79f254b75ba9753ee@DB4PR30MB030.064d.mgd.msft.net> <55EE90ED.1040609@gmail.com> Message-ID: <1634726badf84376a7a583283a76ac4e@DB4PR30MB030.064d.mgd.msft.net> I'm not willing to fight too hard for it, but it feels more like the "right" solution than retaining a cut-and-paste copy of the same code and bifurcating further on each argument you want to consider such a degree of freedom. Like I say, I?m not against allowing polymorphism over unlifted-but-boxed types, and I can see the advantages. But it?s a separate proposal in its own right. Simon From: Edward Kmett [mailto:ekmett at gmail.com] Sent: 08 September 2015 09:30 To: Simon Marlow Cc: Simon Peyton Jones; Ryan Newton; Johan Tibell; Manuel M T Chakravarty; Chao-Hong Chen; ghc-devs; Ryan Scott; Ryan Yates Subject: Re: ArrayArrays Once you start to include all the other primitive types there is a bit more of an explosion. MVar#, TVar#, MutVar#, Small variants, etc. can all be modified to carry unlifted content. Being able to be parametric over that choice would permit a number of things in user land to do the same thing with an open-ended set of design possibilities that are rather hard to contemplate in advance. e.g. being able to abstract over them could let you just use a normal (,) to carry around unlifted parametric data types or being able to talk about [MVar# s a] drastically reducing the number of one off data types we need to invent. If you can talk about the machinery mentioned above then you can have typeclasses parameterized on an argument that could be either unlifted or lifted. I'm not willing to fight too hard for it, but it feels more like the "right" solution than retaining a cut-and-paste copy of the same code and bifurcating further on each argument you want to consider such a degree of freedom. As such it seems like a pretty big win for a comparatively minor change to the levity polymorphism machinery. -Edward On Tue, Sep 8, 2015 at 3:40 AM, Simon Marlow > wrote: This would be very cool, however it's questionable whether it's worth it. Without any unlifted kind, we need - ArrayArray# - a set of new/read/write primops for every element type, either built-in or made from unsafeCoerce# With the unlifted kind, we would need - ArrayArray# - one set of new/read/write primops With levity polymorphism, we would need - none of this, Array# can be used So having an unlifted kind already kills a lot of the duplication, polymorphism only kills a bit more. Cheers Simon On 08/09/2015 00:14, Edward Kmett wrote: Assume we had the ability to talk about Levity in a new way and instead of just: data Levity = Lifted | Unlifted type * = TYPE 'Lifted type # = TYPE 'Unlifted we replace had a more nuanced notion of TYPE parameterized on another data type: data Levity = Lifted | Unlifted data Param = Composite | Simple Levity and we parameterized TYPE with a Param rather than Levity. Existing strange representations can continue to live in TYPE 'Composite (# Int# , Double #) :: TYPE 'Composite and we don't support parametricity in there, just like, currently we don't allow parametricity in #. We can include the undefined example from Richard's talk: undefined :: forall (v :: Param). v and ultimately lift it into his pi type when it is available just as before. But we could let consider TYPE ('Simple 'Unlifted) as a form of 'parametric #' covering unlifted things we're willing to allow polymorphism over because they are just pointers to something in the heap, that just happens to not be able to be _|_ or a thunk. In this setting, recalling that above, I modified Richard's TYPE to take a Param instead of Levity, we can define a type alias for things that live as a simple pointer to a heap allocated object: type GC (l :: Levity) = TYPE ('Simple l) type * = GC 'Lifted and then we can look at existing primitives generalized: Array# :: forall (l :: Levity) (a :: GC l). a -> GC 'Unlifted MutableArray# :: forall (l :: Levity) (a :: GC l). * -> a -> GC 'Unlifted SmallArray# :: forall (l :: Levity) (a :: GC l). a -> GC 'Unlifted SmallMutableArray# :: forall (l :: Levity) (a :: GC l). * -> a -> GC 'Unlifted MutVar# :: forall (l :: Levity) (a :: GC l). * -> a -> GC 'Unlifted MVar# :: forall (l :: Levity) (a :: GC l). * -> a -> GC 'Unlifted Weak#, StablePtr#, StableName#, etc. all can take similar modifications. Recall that an ArrayArray# was just an Array# hacked up to be able to hold onto the subset of # that is collectable. Almost all of the operations on these data types can work on the more general kind of argument. newArray# :: forall (s :: *) (l :: Levity) (a :: GC l). Int# -> a -> State# s -> (# State# s, MutableArray# s a #) writeArray# :: forall (s :: *) (l :: Levity) (a :: GC l). MutableArray# s a -> Int# -> a -> State# s -> State# s readArray# :: forall (s :: *) (l :: Levity) (a :: GC l). MutableArray# s a -> Int# -> State# s -> (# State# s, a #) etc. Only a couple of our existing primitives _can't_ generalize this way. The one that leaps to mind is atomicModifyMutVar, which would need to stay constrained to only work on arguments in *, because of the way it operates. With that we can still talk about MutableArray# s Int but now we can also talk about: MutableArray# s (MutableArray# s Int) without the layer of indirection through a box in * and without an explosion of primops. The same newFoo, readFoo, writeFoo machinery works for both kinds. The struct machinery doesn't get to take advantage of this, but it would let us clean house elsewhere in Prim and drastically improve the range of applicability of the existing primitives with nothing more than a small change to the levity machinery. I'm not attached to any of the names above, I coined them just to give us a concrete thing to talk about. Here I'm only proposing we extend machinery in GHC.Prim this way, but an interesting 'now that the barn door is open' question is to consider that our existing Haskell data types often admit a similar form of parametricity and nothing in principle prevents this from working for Maybe or [] and once you permit inference to fire across all of GC l then it seems to me that you'd start to get those same capabilities there as well when LevityPolymorphism was turned on. -Edward On Mon, Sep 7, 2015 at 5:56 PM, Simon Peyton Jones >> wrote: This could make the menagerie of ways to pack {Small}{Mutable}Array{Array}# references into a {Small}{Mutable}Array{Array}#' actually typecheck soundly, reducing the need for folks to descend into the use of the more evil structure primitives we're talking about, and letting us keep a few more principles around us.____ __ __ I?m lost. Can you give some concrete examples that illustrate how levity polymorphism will help us?____ Simon____ __ __ *From:*Edward Kmett [mailto:ekmett at gmail.com >] *Sent:* 07 September 2015 21:17 *To:* Simon Peyton Jones *Cc:* Ryan Newton; Johan Tibell; Simon Marlow; Manuel M T Chakravarty; Chao-Hong Chen; ghc-devs; Ryan Scott; Ryan Yates *Subject:* Re: ArrayArrays____ __ __ I had a brief discussion with Richard during the Haskell Symposium about how we might be able to let parametricity help a bit in reducing the space of necessarily primops to a slightly more manageable level. ____ __ __ Notably, it'd be interesting to explore the ability to allow parametricity over the portion of # that is just a gcptr.____ __ __ We could do this if the levity polymorphism machinery was tweaked a bit. You could envision the ability to abstract over things in both * and the subset of # that are represented by a gcptr, then modifying the existing array primitives to be parametric in that choice of levity for their argument so long as it was of a "heap object" levity.____ __ __ This could make the menagerie of ways to pack {Small}{Mutable}Array{Array}# references into a {Small}{Mutable}Array{Array}#' actually typecheck soundly, reducing the need for folks to descend into the use of the more evil structure primitives we're talking about, and letting us keep a few more principles around us.____ __ __ Then in the cases like `atomicModifyMutVar#` where it needs to actually be in * rather than just a gcptr, due to the constructed field selectors it introduces on the heap then we could keep the existing less polymorphic type.____ __ __ -Edward____ __ __ On Mon, Sep 7, 2015 at 9:59 AM, Simon Peyton Jones >> wrote:____ It was fun to meet and discuss this.____ ____ Did someone volunteer to write a wiki page that describes the proposed design? And, I earnestly hope, also describes the menagerie of currently available array types and primops so that users can have some chance of picking the right one?!____ ____ Thanks____ ____ Simon____ ____ *From:*ghc-devs [mailto:ghc-devs-bounces at haskell.org >] *On Behalf Of *Ryan Newton *Sent:* 31 August 2015 23:11 *To:* Edward Kmett; Johan Tibell *Cc:* Simon Marlow; Manuel M T Chakravarty; Chao-Hong Chen; ghc-devs; Ryan Scott; Ryan Yates *Subject:* Re: ArrayArrays____ ____ Dear Edward, Ryan Yates, and other interested parties -- ____ ____ So when should we meet up about this?____ ____ May I propose the Tues afternoon break for everyone at ICFP who is interested in this topic? We can meet out in the coffee area and congregate around Edward Kmett, who is tall and should be easy to find ;-).____ ____ I think Ryan is going to show us how to use his new primops for combined array + other fields in one heap object?____ ____ On Sat, Aug 29, 2015 at 9:24 PM Edward Kmett >> wrote:____ Without a custom primitive it doesn't help much there, you have to store the indirection to the mask.____ ____ With a custom primitive it should cut the on heap root-to-leaf path of everything in the HAMT in half. A shorter HashMap was actually one of the motivating factors for me doing this. It is rather astoundingly difficult to beat the performance of HashMap, so I had to start cheating pretty badly. ;)____ ____ -Edward____ ____ On Sat, Aug 29, 2015 at 5:45 PM, Johan Tibell >> wrote:____ I'd also be interested to chat at ICFP to see if I can use this for my HAMT implementation.____ ____ On Sat, Aug 29, 2015 at 3:07 PM, Edward Kmett >> wrote:____ Sounds good to me. Right now I'm just hacking up composable accessors for "typed slots" in a fairly lens-like fashion, and treating the set of slots I define and the 'new' function I build for the data type as its API, and build atop that. This could eventually graduate to template-haskell, but I'm not entirely satisfied with the solution I have. I currently distinguish between what I'm calling "slots" (things that point directly to another SmallMutableArrayArray# sans wrapper) and "fields" which point directly to the usual Haskell data types because unifying the two notions meant that I couldn't lift some coercions out "far enough" to make them vanish.____ ____ I'll be happy to run through my current working set of issues in person and -- as things get nailed down further -- in a longer lived medium than in personal conversations. ;)____ ____ -Edward____ ____ On Sat, Aug 29, 2015 at 7:59 AM, Ryan Newton >> wrote:____ I'd also love to meet up at ICFP and discuss this. I think the array primops plus a TH layer that lets (ab)use them many times without too much marginal cost sounds great. And I'd like to learn how we could be either early users of, or help with, this infrastructure.____ ____ CC'ing in Ryan Scot and Omer Agacan who may also be interested in dropping in on such discussions @ICFP, and Chao-Hong Chen, a Ph.D. student who is currently working on concurrent data structures in Haskell, but will not be at ICFP.____ ____ ____ On Fri, Aug 28, 2015 at 7:47 PM, Ryan Yates >> wrote:____ I completely agree. I would love to spend some time during ICFP and friends talking about what it could look like. My small array for STM changes for the RTS can be seen here [1]. It is on a branch somewhere between 7.8 and 7.10 and includes irrelevant STM bits and some confusing naming choices (sorry), but should cover all the details needed to implement it for a non-STM context. The biggest surprise for me was following small array too closely and having a word/byte offset miss-match [2]. [1]: https://github.com/fryguybob/ghc/compare/ghc-htm-bloom...fryguybob:ghc-htm-mut [2]: https://ghc.haskell.org/trac/ghc/ticket/10413 Ryan____ On Fri, Aug 28, 2015 at 10:09 PM, Edward Kmett >> wrote: > I'd love to have that last 10%, but its a lot of work to get there and more > importantly I don't know quite what it should look like. > > On the other hand, I do have a pretty good idea of how the primitives above > could be banged out and tested in a long evening, well in time for 7.12. And > as noted earlier, those remain useful even if a nicer typed version with an > extra level of indirection to the sizes is built up after. > > The rest sounds like a good graduate student project for someone who has > graduate students lying around. Maybe somebody at Indiana University who has > an interest in type theory and parallelism can find us one. =) > > -Edward > > On Fri, Aug 28, 2015 at 8:48 PM, Ryan Yates >> wrote: >> >> I think from my perspective, the motivation for getting the type >> checker involved is primarily bringing this to the level where users >> could be expected to build these structures. it is reasonable to >> think that there are people who want to use STM (a context with >> mutation already) to implement a straight forward data structure that >> avoids extra indirection penalty. There should be some places where >> knowing that things are field accesses rather then array indexing >> could be helpful, but I think GHC is good right now about handling >> constant offsets. In my code I don't do any bounds checking as I know >> I will only be accessing my arrays with constant indexes. I make >> wrappers for each field access and leave all the unsafe stuff in >> there. When things go wrong though, the compiler is no help. Maybe >> template Haskell that generates the appropriate wrappers is the right >> direction to go. >> There is another benefit for me when working with these as arrays in >> that it is quite simple and direct (given the hoops already jumped >> through) to play with alignment. I can ensure two pointers are never >> on the same cache-line by just spacing things out in the array. >> >> On Fri, Aug 28, 2015 at 7:33 PM, Edward Kmett >> wrote: >> > They just segfault at this level. ;) >> > >> > Sent from my iPhone >> > >> > On Aug 28, 2015, at 7:25 PM, Ryan Newton >> wrote: >> > >> > You presumably also save a bounds check on reads by hard-coding the >> > sizes? >> > >> > On Fri, Aug 28, 2015 at 3:39 PM, Edward Kmett >> wrote: >> >> >> >> Also there are 4 different "things" here, basically depending on two >> >> independent questions: >> >> >> >> a.) if you want to shove the sizes into the info table, and >> >> b.) if you want cardmarking. >> >> >> >> Versions with/without cardmarking for different sizes can be done >> >> pretty >> >> easily, but as noted, the infotable variants are pretty invasive. >> >> >> >> -Edward >> >> >> >> On Fri, Aug 28, 2015 at 6:36 PM, Edward Kmett >> wrote: >> >>> >> >>> Well, on the plus side you'd save 16 bytes per object, which adds up >> >>> if >> >>> they were small enough and there are enough of them. You get a bit >> >>> better >> >>> locality of reference in terms of what fits in the first cache line of >> >>> them. >> >>> >> >>> -Edward >> >>> >> >>> On Fri, Aug 28, 2015 at 6:14 PM, Ryan Newton >> >> >>> wrote: >> >>>> >> >>>> Yes. And for the short term I can imagine places we will settle with >> >>>> arrays even if it means tracking lengths unnecessarily and >> >>>> unsafeCoercing >> >>>> pointers whose types don't actually match their siblings. >> >>>> >> >>>> Is there anything to recommend the hacks mentioned for fixed sized >> >>>> array >> >>>> objects *other* than using them to fake structs? (Much to >> >>>> derecommend, as >> >>>> you mentioned!) >> >>>> >> >>>> On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett >> >> >>>> wrote: >> >>>>> >> >>>>> I think both are useful, but the one you suggest requires a lot more >> >>>>> plumbing and doesn't subsume all of the usecases of the other. >> >>>>> >> >>>>> -Edward >> >>>>> >> >>>>> On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton >> >> >>>>> wrote: >> >>>>>> >> >>>>>> So that primitive is an array like thing (Same pointed type, >> >>>>>> unbounded >> >>>>>> length) with extra payload. >> >>>>>> >> >>>>>> I can see how we can do without structs if we have arrays, >> >>>>>> especially >> >>>>>> with the extra payload at front. But wouldn't the general solution >> >>>>>> for >> >>>>>> structs be one that that allows new user data type defs for # >> >>>>>> types? >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett >> >> >>>>>> wrote: >> >>>>>>> >> >>>>>>> Some form of MutableStruct# with a known number of words and a >> >>>>>>> known >> >>>>>>> number of pointers is basically what Ryan Yates was suggesting >> >>>>>>> above, but >> >>>>>>> where the word counts were stored in the objects themselves. >> >>>>>>> >> >>>>>>> Given that it'd have a couple of words for those counts it'd >> >>>>>>> likely >> >>>>>>> want to be something we build in addition to MutVar# rather than a >> >>>>>>> replacement. >> >>>>>>> >> >>>>>>> On the other hand, if we had to fix those numbers and build info >> >>>>>>> tables that knew them, and typechecker support, for instance, it'd >> >>>>>>> get >> >>>>>>> rather invasive. >> >>>>>>> >> >>>>>>> Also, a number of things that we can do with the 'sized' versions >> >>>>>>> above, like working with evil unsized c-style arrays directly >> >>>>>>> inline at the >> >>>>>>> end of the structure cease to be possible, so it isn't even a pure >> >>>>>>> win if we >> >>>>>>> did the engineering effort. >> >>>>>>> >> >>>>>>> I think 90% of the needs I have are covered just by adding the one >> >>>>>>> primitive. The last 10% gets pretty invasive. >> >>>>>>> >> >>>>>>> -Edward >> >>>>>>> >> >>>>>>> On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton >> >> >>>>>>> wrote: >> >>>>>>>> >> >>>>>>>> I like the possibility of a general solution for mutable structs >> >>>>>>>> (like Ed said), and I'm trying to fully understand why it's hard. >> >>>>>>>> >> >>>>>>>> So, we can't unpack MutVar into constructors because of object >> >>>>>>>> identity problems. But what about directly supporting an >> >>>>>>>> extensible set of >> >>>>>>>> unlifted MutStruct# objects, generalizing (and even replacing) >> >>>>>>>> MutVar#? That >> >>>>>>>> may be too much work, but is it problematic otherwise? >> >>>>>>>> >> >>>>>>>> Needless to say, this is also critical if we ever want best in >> >>>>>>>> class >> >>>>>>>> lockfree mutable structures, just like their Stm and sequential >> >>>>>>>> counterparts. >> >>>>>>>> >> >>>>>>>> On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones >> >>>>>>>> >> wrote: >> >>>>>>>>> >> >>>>>>>>> At the very least I'll take this email and turn it into a short >> >>>>>>>>> article. >> >>>>>>>>> >> >>>>>>>>> Yes, please do make it into a wiki page on the GHC Trac, and >> >>>>>>>>> maybe >> >>>>>>>>> make a ticket for it. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Thanks >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Simon >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> From: Edward Kmett [mailto:ekmett at gmail.com >] >> >>>>>>>>> Sent: 27 August 2015 16:54 >> >>>>>>>>> To: Simon Peyton Jones >> >>>>>>>>> Cc: Manuel M T Chakravarty; Simon Marlow; ghc-devs >> >>>>>>>>> Subject: Re: ArrayArrays >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> An ArrayArray# is just an Array# with a modified invariant. It >> >>>>>>>>> points directly to other unlifted ArrayArray#'s or ByteArray#'s. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> While those live in #, they are garbage collected objects, so >> >>>>>>>>> this >> >>>>>>>>> all lives on the heap. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> They were added to make some of the DPH stuff fast when it has >> >>>>>>>>> to >> >>>>>>>>> deal with nested arrays. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I'm currently abusing them as a placeholder for a better thing. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> The Problem >> >>>>>>>>> >> >>>>>>>>> ----------------- >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Consider the scenario where you write a classic doubly-linked >> >>>>>>>>> list >> >>>>>>>>> in Haskell. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Chasing from one DLL to the next requires following 3 pointers >> >>>>>>>>> on >> >>>>>>>>> the heap. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> DLL ~> IORef (Maybe DLL) ~> MutVar# RealWorld (Maybe DLL) ~> >> >>>>>>>>> Maybe >> >>>>>>>>> DLL ~> DLL >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> That is 3 levels of indirection. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> We can trim one by simply unpacking the IORef with >> >>>>>>>>> -funbox-strict-fields or UNPACK >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> We can trim another by adding a 'Nil' constructor for DLL and >> >>>>>>>>> worsening our representation. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> but now we're still stuck with a level of indirection >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> DLL ~> MutVar# RealWorld DLL ~> DLL >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> This means that every operation we perform on this structure >> >>>>>>>>> will >> >>>>>>>>> be about half of the speed of an implementation in most other >> >>>>>>>>> languages >> >>>>>>>>> assuming we're memory bound on loading things into cache! >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Making Progress >> >>>>>>>>> >> >>>>>>>>> ---------------------- >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I have been working on a number of data structures where the >> >>>>>>>>> indirection of going from something in * out to an object in # >> >>>>>>>>> which >> >>>>>>>>> contains the real pointer to my target and coming back >> >>>>>>>>> effectively doubles >> >>>>>>>>> my runtime. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> We go out to the MutVar# because we are allowed to put the >> >>>>>>>>> MutVar# >> >>>>>>>>> onto the mutable list when we dirty it. There is a well defined >> >>>>>>>>> write-barrier. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I could change out the representation to use >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data DLL = DLL (MutableArray# RealWorld DLL) | Nil >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I can just store two pointers in the MutableArray# every time, >> >>>>>>>>> but >> >>>>>>>>> this doesn't help _much_ directly. It has reduced the amount of >> >>>>>>>>> distinct >> >>>>>>>>> addresses in memory I touch on a walk of the DLL from 3 per >> >>>>>>>>> object to 2. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I still have to go out to the heap from my DLL and get to the >> >>>>>>>>> array >> >>>>>>>>> object and then chase it to the next DLL and chase that to the >> >>>>>>>>> next array. I >> >>>>>>>>> do get my two pointers together in memory though. I'm paying for >> >>>>>>>>> a card >> >>>>>>>>> marking table as well, which I don't particularly need with just >> >>>>>>>>> two >> >>>>>>>>> pointers, but we can shed that with the "SmallMutableArray#" >> >>>>>>>>> machinery added >> >>>>>>>>> back in 7.10, which is just the old array code a a new data >> >>>>>>>>> type, which can >> >>>>>>>>> speed things up a bit when you don't have very big arrays: >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> But what if I wanted my object itself to live in # and have two >> >>>>>>>>> mutable fields and be able to share the sme write barrier? >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> An ArrayArray# points directly to other unlifted array types. >> >>>>>>>>> What >> >>>>>>>>> if we have one # -> * wrapper on the outside to deal with the >> >>>>>>>>> impedence >> >>>>>>>>> mismatch between the imperative world and Haskell, and then just >> >>>>>>>>> let the >> >>>>>>>>> ArrayArray#'s hold other arrayarrays. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data DLL = DLL (MutableArrayArray# RealWorld) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> now I need to make up a new Nil, which I can just make be a >> >>>>>>>>> special >> >>>>>>>>> MutableArrayArray# I allocate on program startup. I can even >> >>>>>>>>> abuse pattern >> >>>>>>>>> synonyms. Alternately I can exploit the internals further to >> >>>>>>>>> make this >> >>>>>>>>> cheaper. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Then I can use the readMutableArrayArray# and >> >>>>>>>>> writeMutableArrayArray# calls to directly access the preceding >> >>>>>>>>> and next >> >>>>>>>>> entry in the linked list. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> So now we have one DLL wrapper which just 'bootstraps me' into a >> >>>>>>>>> strict world, and everything there lives in #. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> next :: DLL -> IO DLL >> >>>>>>>>> >> >>>>>>>>> next (DLL m) = IO $ \s -> case readMutableArrayArray# s of >> >>>>>>>>> >> >>>>>>>>> (# s', n #) -> (# s', DLL n #) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> It turns out GHC is quite happy to optimize all of that code to >> >>>>>>>>> keep things unboxed. The 'DLL' wrappers get removed pretty >> >>>>>>>>> easily when they >> >>>>>>>>> are known strict and you chain -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at andres-loeh.de Tue Sep 8 11:59:42 2015 From: mail at andres-loeh.de (Andres Loeh) Date: Tue, 8 Sep 2015 13:59:42 +0200 Subject: Proposal: Automatic derivation of Lift In-Reply-To: References: Message-ID: I don't think there's any fundamental reason why unboxed fields prevent a Generic instance, as long as we're happy that unboxed values will be re-boxed in the generic representation. It simply seems as if nobody has thought of implementing this. As an example, consider the following hand-written example which works just fine: {-# LANGUAGE MagicHash, KindSignatures, PolyKinds, TypeOperators, TypeFamilies #-} module GenUnboxed where import GHC.Exts import GHC.Generics import Generics.Deriving.Eq data UPair = UPair Int# Char# instance Generic UPair where type Rep UPair = K1 R Int :*: K1 R Char from (UPair x y) = K1 (I# x) :*: K1 (C# y) to (K1 (I# x) :*: K1 (C# y)) = UPair x y instance GEq UPair test :: Bool test = let p = UPair 3# 'x'# in geq p p Cheers, Andres On Mon, Sep 7, 2015 at 10:02 PM, Ryan Scott wrote: > Unlifted types can't be used polymorphically or in instance > declarations, so this makes it impossible to do something like > > instance Generic Int# > > or store an Int# in one branch of a (:*:), preventing generics from > doing anything in #-land. (unless someone has found a way to hack > around this). > > I would be okay with implementing a generics-based approach, but we'd > have to add a caveat that it will only work out-of-the-box on GHC 8.0 > or later, due to TH's need to look up package information. (We could > give users the ability to specify a package name manually as a > workaround.) > > If this were added, where would be the best place to put it? th-lift? > generic-deriving? template-haskell? A new package (lift-generics)? > > Ryan S. > > On Mon, Sep 7, 2015 at 3:10 PM, Matthew Pickering > wrote: >> Continuing my support of the generics route. Is there a fundamental >> reason why it couldn't handle unlifted types? Given their relative >> paucity, it seems like a fair compromise to generically define lift >> instances for all normal data types but require TH for unlifted types. >> This approach seems much smoother from a maintenance perspective. >> >> On Mon, Sep 7, 2015 at 5:26 PM, Ryan Scott wrote: >>> There is a Lift typeclass defined in template-haskell [1] which, when >>> a data type is an instance, permits it to be directly used in a TH >>> quotation, like so >>> >>> data Example = Example >>> >>> instance Lift Example where >>> lift Example = conE (mkNameG_d "" "" "Example") >>> >>> e :: Example >>> e = [| Example |] >>> >>> Making Lift instances for most data types is straightforward and >>> mechanical, so the proposal is to allow automatic derivation of Lift >>> via a -XDeriveLift extension: >>> >>> data Example = Example deriving Lift >>> >>> This is actually a pretty a pretty old proposal [2], dating back to >>> 2007. I wanted to have this feature for my needs, so I submitted a >>> proof-of-concept at the GHC Trac issue page [3]. >>> >>> The question now is: do we really want to bake this feature into GHC? >>> Since not many people opined on the Trac page, I wanted to submit this >>> here for wider visibility and to have a discussion. >>> >>> Here are some arguments I have heard against this feature (please tell >>> me if I am misrepresenting your opinion): >>> >>> * We already have a th-lift package [4] on Hackage which allows >>> derivation of Lift via Template Haskell functions. In addition, if >>> you're using Lift, chances are you're also using the -XTemplateHaskell >>> extension in the first place, so th-lift should be suitable. >>> * The same functionality could be added via GHC generics (as of GHC >>> 7.12/8.0, which adds the ability to reify a datatype's package name >>> [5]), if -XTemplateHaskell can't be used. >>> * Adding another -XDerive- extension places a burden on GHC devs to >>> maintain it in the future in response to further Template Haskell >>> changes. >>> >>> Here are my (opinionated) responses to each of these: >>> >>> * th-lift isn't as fully-featured as a -XDerive- extension at the >>> moment, since it can't do sophisticated type inference [6] or derive >>> for data families. This is something that could be addressed with a >>> patch to th-lift, though. >>> * GHC generics wouldn't be enough to handle unlifted types like Int#, >>> Char#, or Double# (which other -XDerive- extensions do). >>> * This is a subjective measurement, but in terms of the amount of code >>> I had to add, -XDeriveLift was substantially simpler than other >>> -XDerive extensions, because there are fewer weird corner cases. Plus, >>> I'd volunteer to maintain it :) >>> >>> Simon PJ wanted to know if other Template Haskell programmers would >>> find -XDeriveLift useful. Would you be able to use it? Would you like >>> to see a solution other than putting it into GHC? I'd love to hear >>> feedback so we can bring some closure to this 8-year-old feature >>> request. >>> >>> Ryan S. >>> >>> ----- >>> [1] http://hackage.haskell.org/package/template-haskell-2.10.0.0/docs/Language-Haskell-TH-Syntax.html#t:Lift >>> [2] https://mail.haskell.org/pipermail/template-haskell/2007-October/000635.html >>> [3] https://ghc.haskell.org/trac/ghc/ticket/1830 >>> [4] http://hackage.haskell.org/package/th-lift >>> [5] https://ghc.haskell.org/trac/ghc/ticket/10030 >>> [6] https://ghc.haskell.org/trac/ghc/ticket/1830#comment:11 >>> _______________________________________________ >>> ghc-devs mailing list >>> ghc-devs at haskell.org >>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From simonpj at microsoft.com Tue Sep 8 12:03:05 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Tue, 8 Sep 2015 12:03:05 +0000 Subject: Proposal: Automatic derivation of Lift In-Reply-To: References: Message-ID: <6962f286ec8346f6866f25b7112352f6@DB4PR30MB030.064d.mgd.msft.net> | I don't think there's any fundamental reason why unboxed fields | prevent a Generic instance, as long as we're happy that unboxed values | will be re-boxed in the generic representation. It simply seems as if Interesting and quite reasonable idea, as an extension to `deriving(Generic)`. Make a ticket? Simon | -----Original Message----- | From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of | Andres Loeh | Sent: 08 September 2015 13:00 | To: Ryan Scott | Cc: GHC developers | Subject: Re: Proposal: Automatic derivation of Lift | | I don't think there's any fundamental reason why unboxed fields | prevent a Generic instance, as long as we're happy that unboxed values | will be re-boxed in the generic representation. It simply seems as if | nobody has thought of implementing this. As an example, consider the | following hand-written example which works just fine: | | {-# LANGUAGE MagicHash, KindSignatures, PolyKinds, TypeOperators, | TypeFamilies #-} module GenUnboxed where | | import GHC.Exts | import GHC.Generics | import Generics.Deriving.Eq | | data UPair = UPair Int# Char# | | instance Generic UPair where | type Rep UPair = K1 R Int :*: K1 R Char | from (UPair x y) = K1 (I# x) :*: K1 (C# y) | to (K1 (I# x) :*: K1 (C# y)) = UPair x y | | instance GEq UPair | | test :: Bool | test = let p = UPair 3# 'x'# in geq p p | | Cheers, | Andres | | On Mon, Sep 7, 2015 at 10:02 PM, Ryan Scott | wrote: | > Unlifted types can't be used polymorphically or in instance | > declarations, so this makes it impossible to do something like | > | > instance Generic Int# | > | > or store an Int# in one branch of a (:*:), preventing generics from | > doing anything in #-land. (unless someone has found a way to hack | > around this). | > | > I would be okay with implementing a generics-based approach, but | we'd | > have to add a caveat that it will only work out-of-the-box on GHC | 8.0 | > or later, due to TH's need to look up package information. (We could | > give users the ability to specify a package name manually as a | > workaround.) | > | > If this were added, where would be the best place to put it? th- | lift? | > generic-deriving? template-haskell? A new package (lift-generics)? | > | > Ryan S. | > | > On Mon, Sep 7, 2015 at 3:10 PM, Matthew Pickering | > wrote: | >> Continuing my support of the generics route. Is there a fundamental | >> reason why it couldn't handle unlifted types? Given their relative | >> paucity, it seems like a fair compromise to generically define lift | >> instances for all normal data types but require TH for unlifted | types. | >> This approach seems much smoother from a maintenance perspective. | >> | >> On Mon, Sep 7, 2015 at 5:26 PM, Ryan Scott | wrote: | >>> There is a Lift typeclass defined in template-haskell [1] which, | >>> when a data type is an instance, permits it to be directly used in | a | >>> TH quotation, like so | >>> | >>> data Example = Example | >>> | >>> instance Lift Example where | >>> lift Example = conE (mkNameG_d "" | >>> "" "Example") | >>> | >>> e :: Example | >>> e = [| Example |] | >>> | >>> Making Lift instances for most data types is straightforward and | >>> mechanical, so the proposal is to allow automatic derivation of | Lift | >>> via a -XDeriveLift extension: | >>> | >>> data Example = Example deriving Lift | >>> | >>> This is actually a pretty a pretty old proposal [2], dating back | to | >>> 2007. I wanted to have this feature for my needs, so I submitted a | >>> proof-of-concept at the GHC Trac issue page [3]. | >>> | >>> The question now is: do we really want to bake this feature into | GHC? | >>> Since not many people opined on the Trac page, I wanted to submit | >>> this here for wider visibility and to have a discussion. | >>> | >>> Here are some arguments I have heard against this feature (please | >>> tell me if I am misrepresenting your opinion): | >>> | >>> * We already have a th-lift package [4] on Hackage which allows | >>> derivation of Lift via Template Haskell functions. In addition, if | >>> you're using Lift, chances are you're also using the | >>> -XTemplateHaskell extension in the first place, so th-lift should | be suitable. | >>> * The same functionality could be added via GHC generics (as of | GHC | >>> 7.12/8.0, which adds the ability to reify a datatype's package | name | >>> [5]), if -XTemplateHaskell can't be used. | >>> * Adding another -XDerive- extension places a burden on GHC devs | to | >>> maintain it in the future in response to further Template Haskell | >>> changes. | >>> | >>> Here are my (opinionated) responses to each of these: | >>> | >>> * th-lift isn't as fully-featured as a -XDerive- extension at the | >>> moment, since it can't do sophisticated type inference [6] or | derive | >>> for data families. This is something that could be addressed with | a | >>> patch to th-lift, though. | >>> * GHC generics wouldn't be enough to handle unlifted types like | >>> Int#, Char#, or Double# (which other -XDerive- extensions do). | >>> * This is a subjective measurement, but in terms of the amount of | >>> code I had to add, -XDeriveLift was substantially simpler than | other | >>> -XDerive extensions, because there are fewer weird corner cases. | >>> Plus, I'd volunteer to maintain it :) | >>> | >>> Simon PJ wanted to know if other Template Haskell programmers | would | >>> find -XDeriveLift useful. Would you be able to use it? Would you | >>> like to see a solution other than putting it into GHC? I'd love to | >>> hear feedback so we can bring some closure to this 8-year-old | >>> feature request. | >>> | >>> Ryan S. | >>> | >>> ----- | >>> [1] | >>> http://hackage.haskell.org/package/template-haskell- | 2.10.0.0/docs/La | >>> nguage-Haskell-TH-Syntax.html#t:Lift | >>> [2] | >>> https://mail.haskell.org/pipermail/template-haskell/2007- | October/000 | >>> 635.html [3] https://ghc.haskell.org/trac/ghc/ticket/1830 | >>> [4] http://hackage.haskell.org/package/th-lift | >>> [5] https://ghc.haskell.org/trac/ghc/ticket/10030 | >>> [6] https://ghc.haskell.org/trac/ghc/ticket/1830#comment:11 | >>> _______________________________________________ | >>> ghc-devs mailing list | >>> ghc-devs at haskell.org | >>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs | > _______________________________________________ | > ghc-devs mailing list | > ghc-devs at haskell.org | > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs | _______________________________________________ | ghc-devs mailing list | ghc-devs at haskell.org | http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From eir at cis.upenn.edu Tue Sep 8 12:05:26 2015 From: eir at cis.upenn.edu (Richard Eisenberg) Date: Tue, 8 Sep 2015 08:05:26 -0400 Subject: Unpacking sum types In-Reply-To: References: <55EE93DC.7050409@gmail.com> <55EEA24C.9080504@gmail.com> Message-ID: <9BA63ECF-DADE-4C32-B707-D494F13E4CE2@cis.upenn.edu> I just added two design notes to the wiki page: 1. If we're stealing syntax, we're stealing quite a few operators. Things like (#|), and (|#) in terms, along with the otherwise-quite-reasonable (x ||). We're also stealing things like (||) and (#||#|) in types. The fact that we're stealing (||) at the type level is quite unfortunate, to me. I won't fight against a growing tide on this issue, but I favor not changing the lexer here and requiring lots of spaces. 2. A previous email in this thread mentioned a (0 of 2 | ...) syntax for data constructors. This might be better than forcing writers and readers to count vertical bars. (Of course, we already require counting commas.) Glad to see this coming together! Richard On Sep 8, 2015, at 7:48 AM, Simon Peyton Jones wrote: > | I see, but then you can't have multiple fields, like > | > | ( (# Int,Bool #) |) > | > | You'd have to box the inner tuple too. Ok, I suppose. > > Well of course! It's just a parameterised data type, like a tuple. But, just like unboxed tuples, you could have an unboxed tuple (or sum) inside an unboxed tuple. > > (# (# Int,Bool #) | Int #) > > Simon > > | -----Original Message----- > | From: Simon Marlow [mailto:marlowsd at gmail.com] > | Sent: 08 September 2015 09:55 > | To: Simon Peyton Jones; Johan Tibell; Ryan Newton > | Cc: ghc-devs at haskell.org > | Subject: Re: Unpacking sum types > | > | On 08/09/2015 09:31, Simon Peyton Jones wrote: > | > | How did you envisage implementing anonymous boxed sums? What is > | > | their heap representation? > | > > | > *Exactly* like tuples; that is, we have a family of data type > | declarations: > | > > | > data (a|b) = (_|) a > | > | (|_) b > | > > | > data (a|b|c) = (_||) a > | > | (|_|) b > | > | (||_) c > | > ..etc. > | > | I see, but then you can't have multiple fields, like > | > | ( (# Int,Bool #) |) > | > | You'd have to box the inner tuple too. Ok, I suppose. > | > | Cheers > | Simon > | > | > | > Simon > | > > | > | > | > | One option is to use some kind of generic object with a dynamic > | > | number of pointers and non-pointers, and one field for the tag. > | > | The layout would need to be stored in the object. This isn't a > | > | particularly efficient representation, though. Perhaps there > | could > | > | be a family of smaller specialised versions for common sizes. > | > | > | > | Do we have a use case for the boxed version, or is it just for > | > | consistency? > | > | > | > | Cheers > | > | Simon > | > | > | > | > | > | > Looks good to me! > | > | > > | > | > Simon > | > | > > | > | > *From:*Johan Tibell [mailto:johan.tibell at gmail.com] > *Sent:* > | 01 > | > | September 2015 18:24 > *To:* Simon Peyton Jones; Simon Marlow; > | Ryan > | > | Newton > *Cc:* ghc-devs at haskell.org > *Subject:* RFC: Unpacking > | > | sum types > > I have a draft design for unpacking sum types that > | > | I'd like some > feedback on. In particular feedback both on: > | > | > > | > | > * the writing and clarity of the proposal and > | > | > > | > | > * the proposal itself. > | > | > > | > | > https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes > | > | > > | > | > -- Johan > | > | > > | > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From simonpj at microsoft.com Tue Sep 8 12:10:19 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Tue, 8 Sep 2015 12:10:19 +0000 Subject: ArrayArrays In-Reply-To: <55EE90ED.1040609@gmail.com> References: <2FCB6298-A4FF-4F7B-8BF8-4880BB3154AB@gmail.com> <325b043066bb48a79f254b75ba9753ee@DB4PR30MB030.064d.mgd.msft.net> <55EE90ED.1040609@gmail.com> Message-ID: <734d9e915b8446a8ae79c131d6f50c9d@DB4PR30MB030.064d.mgd.msft.net> | Without any unlifted kind, we need | - ArrayArray# | - a set of new/read/write primops for every element type, | either built-in or made from unsafeCoerce# | | With the unlifted kind, we would need | - ArrayArray# | - one set of new/read/write primops | | With levity polymorphism, we would need | - none of this, Array# can be used I don't think levity polymorphism will work here. The code for a function needs to know whether an intermediate value of type 'a' is strict or not. It HAS to choose (unless we compile two versions of every function). So I don't see how to be polymorphic over a type variable that can range over both lifted and unlifted types. The only reason that 'error' is levity-polymorphic over both lifted and unlifted types is that it never returns! error :: forall (a :: AnyKind). String -> a the code for error never manipulates a value of type 'a', so all is well. But it's an incredibly special case. Simon From simonpj at microsoft.com Tue Sep 8 12:35:00 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Tue, 8 Sep 2015 12:35:00 +0000 Subject: Shared data type for extension flags In-Reply-To: References: Message-ID: <72a513a1fc4345b0a48d9727b7b10d53@DB4PR30MB030.064d.mgd.msft.net> Yes, we?d have to broaden the description of the package. I defer to Edward Yang and Duncan Coutts who have a clearer idea of the architecture in this area. Simon From: Michael Smith [mailto:michael at diglumi.com] Sent: 02 September 2015 17:27 To: Simon Peyton Jones; Matthew Pickering Cc: GHC developers Subject: Re: Shared data type for extension flags The package description for that is "The GHC compiler's view of the GHC package database format", and this doesn't really have to do with the package database format. Would it be okay to put this in there anyway? On Wed, Sep 2, 2015, 07:33 Simon Peyton Jones > wrote: we already have such a shared library, I think: bin-package-db. would that do? Simon From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Michael Smith Sent: 02 September 2015 09:21 To: Matthew Pickering Cc: GHC developers Subject: Re: Shared data type for extension flags That sounds like a good approach. Are there other things that would go nicely in a shared package like this, in addition to the extension data type? On Wed, Sep 2, 2015 at 1:00 AM, Matthew Pickering > wrote: Surely the easiest way here (including for other tooling - ie haskell-src-exts) is to create a package which just provides this enumeration. GHC, cabal, th, haskell-src-exts and so on then all depend on this package rather than creating their own enumeration. On Wed, Sep 2, 2015 at 9:47 AM, Michael Smith > wrote: > #10820 on Trac [1] and D1200 on Phabricator [2] discuss adding the > capababilty > to Template Haskell to detect which language extensions enabled. > Unfortunately, > since template-haskell can't depend on ghc (as ghc depends on > template-haskell), > it can't simply re-export the ExtensionFlag type from DynFlags to the user. > > There is a second data type encoding the list of possible language > extensions in > the Cabal package, in Language.Haskell.Extension [3]. But template-haskell > doesn't already depend on Cabal, and doing so seems like it would cause > difficulties, as the two packages can be upgraded separately. > > So adding this new feature to Template Haskell requires introducing a > *third* > data type for language extensions. It also requires enumerating this full > list > in two more places, to convert back and forth between the TH Extension data > type > and GHC's internal ExtensionFlag data type. > > Is there another way here? Can there be one single shared data type for this > somehow? > > [1] https://ghc.haskell.org/trac/ghc/ticket/10820 > [2] https://phabricator.haskell.org/D1200 > [3] > https://hackage.haskell.org/package/Cabal-1.22.4.0/docs/Language-Haskell-Extension.html > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eir at cis.upenn.edu Tue Sep 8 13:45:40 2015 From: eir at cis.upenn.edu (Richard Eisenberg) Date: Tue, 8 Sep 2015 09:45:40 -0400 Subject: Unlifted data types In-Reply-To: References: <1441353701-sup-9422@sabre> <6707b31c94d44af89ba2a90580ac46ce@DB4PR30MB030.064d.mgd.msft.net> <6e2bcecf1a284c62a656e80992e9862e@DB4PR30MB030.064d.mgd.msft.net> Message-ID: <0196B07B-156B-4731-B0A1-CE7A892E0680@cis.upenn.edu> I have put up an alternate set of proposals on https://ghc.haskell.org/trac/ghc/wiki/UnliftedDataTypes These sidestep around `Force` and `suspend` but probably have other problems. They make heavy use of levity polymorphism. Back story: this all was developed in a late-evening Haskell Symposium session that took place in the hotel bar. It seems Edward and I walked away with quite different understandings of what had taken place. I've written up my understanding. Most likely, the Right Idea is a combination of this all! See what you think. Thanks! Richard On Sep 8, 2015, at 3:52 AM, Simon Peyton Jones wrote: > | And to > | be honest, I'm not sure we need arbitrary data types in Unlifted; > | Force (which would be primitive) might be enough. > > That's an interesting thought. But presumably you'd have to use 'suspend' (a terrible name) a lot: > > type StrictList a = Force (StrictList' a) > data StrictList' a = Nil | Cons !a (StrictList a) > > mapStrict :: (a -> b) -> StrictList a -> StrictList b > mapStrict f xs = mapStrict' f (suspend xs) > > mapStrict' :: (a -> b) -> StrictList' a -> StrictList' b > mapStrict' f Nil = Nil > mapStrict' f (Cons x xs) = Cons (f x) (mapStrict f xs) > > > That doesn't look terribly convenient. > > | ensure that threads don't simply > | pass thunks between each other. But, if you have unlifted types, then > | you can have: > | > | data UMVar (a :: Unlifted) > | > | and then the type rules out the possibility of passing thunks through > | a reference (at least at the top level). > > Really? Presumably UMVar is a new primitive? With a family of operations like MVar? If so can't we just define > newtype UMVar a = UMV (MVar a) > putUMVar :: UMVar a -> a -> IO () > putUMVar (UMVar v) x = x `seq` putMVar v x > > I don't see Force helping here. > > Simon > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From jan.stolarek at p.lodz.pl Tue Sep 8 14:15:02 2015 From: jan.stolarek at p.lodz.pl (Jan Stolarek) Date: Tue, 8 Sep 2015 16:15:02 +0200 Subject: Unlifted data types In-Reply-To: <0196B07B-156B-4731-B0A1-CE7A892E0680@cis.upenn.edu> References: <1441353701-sup-9422@sabre> <0196B07B-156B-4731-B0A1-CE7A892E0680@cis.upenn.edu> Message-ID: <201509081615.03167.jan.stolarek@p.lodz.pl> I think the wiki page is imprecise when it says: > data unlifted UBool = UTrue | UFalse > > Intuitively, if you have x :: UBool in scope, you are guaranteed to have UTrue or UFalse, and > not bottom. But I still can say: foo :: UBool foo = foo and now foo contains bottom. I know that any attempt to use foo will lead to its immediate evaluation, but that is not exactly the same as "not containing a bottom". Or am I missing something here? Janek Dnia wtorek, 8 wrze?nia 2015, Richard Eisenberg napisa?: > I have put up an alternate set of proposals on > > https://ghc.haskell.org/trac/ghc/wiki/UnliftedDataTypes > > These sidestep around `Force` and `suspend` but probably have other > problems. They make heavy use of levity polymorphism. > > Back story: this all was developed in a late-evening Haskell Symposium > session that took place in the hotel bar. It seems Edward and I walked away > with quite different understandings of what had taken place. I've written > up my understanding. Most likely, the Right Idea is a combination of this > all! > > See what you think. > > Thanks! > Richard > > On Sep 8, 2015, at 3:52 AM, Simon Peyton Jones wrote: > > | And to > > | be honest, I'm not sure we need arbitrary data types in Unlifted; > > | Force (which would be primitive) might be enough. > > > > That's an interesting thought. But presumably you'd have to use > > 'suspend' (a terrible name) a lot: > > > > type StrictList a = Force (StrictList' a) > > data StrictList' a = Nil | Cons !a (StrictList a) > > > > mapStrict :: (a -> b) -> StrictList a -> StrictList b > > mapStrict f xs = mapStrict' f (suspend xs) > > > > mapStrict' :: (a -> b) -> StrictList' a -> StrictList' b > > mapStrict' f Nil = Nil > > mapStrict' f (Cons x xs) = Cons (f x) (mapStrict f xs) > > > > > > That doesn't look terribly convenient. > > > > | ensure that threads don't simply > > | pass thunks between each other. But, if you have unlifted types, then > > | you can have: > > | > > | data UMVar (a :: Unlifted) > > | > > | and then the type rules out the possibility of passing thunks through > > | a reference (at least at the top level). > > > > Really? Presumably UMVar is a new primitive? With a family of operations > > like MVar? If so can't we just define newtype UMVar a = UMV (MVar a) > > putUMVar :: UMVar a -> a -> IO () > > putUMVar (UMVar v) x = x `seq` putMVar v x > > > > I don't see Force helping here. > > > > Simon > > _______________________________________________ > > ghc-devs mailing list > > ghc-devs at haskell.org > > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From eir at cis.upenn.edu Tue Sep 8 14:56:31 2015 From: eir at cis.upenn.edu (Richard Eisenberg) Date: Tue, 8 Sep 2015 10:56:31 -0400 Subject: Unlifted data types In-Reply-To: <201509081615.03167.jan.stolarek@p.lodz.pl> References: <1441353701-sup-9422@sabre> <0196B07B-156B-4731-B0A1-CE7A892E0680@cis.upenn.edu> <201509081615.03167.jan.stolarek@p.lodz.pl> Message-ID: <6F1B0D49-C676-44A7-BA30-BE810EC63847@cis.upenn.edu> On Sep 8, 2015, at 10:15 AM, Jan Stolarek wrote: > But I still can say: > > foo :: UBool > foo = foo > > ... Or am I missing > something here? I'm afraid you are. Top-level variables may not have an unlifted type, for exactly this reason. If you were to do this on a local let, your program would loop when it hits the let, so there's no problem there. Richard From simonpj at microsoft.com Tue Sep 8 14:58:23 2015 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Tue, 8 Sep 2015 14:58:23 +0000 Subject: Unlifted data types In-Reply-To: <201509081615.03167.jan.stolarek@p.lodz.pl> References: <1441353701-sup-9422@sabre> <0196B07B-156B-4731-B0A1-CE7A892E0680@cis.upenn.edu> <201509081615.03167.jan.stolarek@p.lodz.pl> Message-ID: | > data unlifted UBool = UTrue | UFalse | > | > Intuitively, if you have x :: UBool in scope, you are guaranteed to | > have UTrue or UFalse, and not bottom. | | But I still can say: | | foo :: UBool | foo = foo | | and now foo contains bottom. You definitely CANNOT have a top-level declaration for a value of an unlifted type, any more than you can have for an Int# or unboxed tuple today. That should resolve your question. Simon From marlowsd at gmail.com Tue Sep 8 14:58:38 2015 From: marlowsd at gmail.com (Simon Marlow) Date: Tue, 8 Sep 2015 15:58:38 +0100 Subject: ArrayArrays In-Reply-To: <734d9e915b8446a8ae79c131d6f50c9d@DB4PR30MB030.064d.mgd.msft.net> References: <325b043066bb48a79f254b75ba9753ee@DB4PR30MB030.064d.mgd.msft.net> <55EE90ED.1040609@gmail.com> <734d9e915b8446a8ae79c131d6f50c9d@DB4PR30MB030.064d.mgd.msft.net> Message-ID: <55EEF79E.604@gmail.com> On 08/09/2015 13:10, Simon Peyton Jones wrote: > > | Without any unlifted kind, we need > | - ArrayArray# > | - a set of new/read/write primops for every element type, > | either built-in or made from unsafeCoerce# > | > | With the unlifted kind, we would need > | - ArrayArray# > | - one set of new/read/write primops > | > | With levity polymorphism, we would need > | - none of this, Array# can be used > > I don't think levity polymorphism will work here. The code for a function needs to know whether an intermediate value of type 'a' is strict or not. It HAS to choose (unless we compile two versions of every function). So I don't see how to be polymorphic over a type variable that can range over both lifted and unlifted types. > > The only reason that 'error' is levity-polymorphic over both lifted and unlifted types is that it never returns! > error :: forall (a :: AnyKind). String -> a > the code for error never manipulates a value of type 'a', so all is well. But it's an incredibly special case. I think there's a bit of confusion here, Ed's email a bit earlier described the proposal for the third option above: https://mail.haskell.org/pipermail/ghc-devs/2015-September/009867.html For generalising these primops it would be fine, there are no thunks being built. Cheers Simon From jan.stolarek at p.lodz.pl Tue Sep 8 15:26:14 2015 From: jan.stolarek at p.lodz.pl (Jan Stolarek) Date: Tue, 8 Sep 2015 17:26:14 +0200 Subject: Unlifted data types In-Reply-To: <6F1B0D49-C676-44A7-BA30-BE810EC63847@cis.upenn.edu> References: <1441353701-sup-9422@sabre> <201509081615.03167.jan.stolarek@p.lodz.pl> <6F1B0D49-C676-44A7-BA30-BE810EC63847@cis.upenn.edu> Message-ID: <201509081726.14448.jan.stolarek@p.lodz.pl> > Top-level variables may not have an unlifted type Ah, that makes much more sense now. Thanks. Janek From alan.zimm at gmail.com Tue Sep 8 18:49:21 2015 From: alan.zimm at gmail.com (Alan & Kim Zimmerman) Date: Tue, 8 Sep 2015 20:49:21 +0200 Subject: Haskell Error Messages Message-ID: Is there currently any planned work around making the haskell error messages able to support something like the ones in IDRIS, as shown in David Christianson's talk "A Pretty printer that says what it means" at HIW? https://www.youtube.com/watch?v=m7BBCcIDXSg&list=PLnqUlCo055hVfNkQHP7z43r10yNo-mc7B&index=10 Alan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ryan.gl.scott at gmail.com Tue Sep 8 19:01:31 2015 From: ryan.gl.scott at gmail.com (Ryan Scott) Date: Tue, 8 Sep 2015 15:01:31 -0400 Subject: Proposal: Automatic derivation of Lift In-Reply-To: References: Message-ID: Sorry, I forgot to reply-all earlier. > I hacked this up quickly just to show that it works in principle. In > practice, I think it's good to not just represent Int# as Int, but as > something like UInt where > > data UInt = UInt Int# > > i.e., is isomorphic to an Int, but distinguishable. Alternatively, > have a generic "unboxed" flag that could be inserted as a tag into the > surrounding K. I suppose we'd have to decide which is easier for programmers to use. Do we introduce UInt, UChar, et al. and require that users define instances of the desired typeclass for them: instance Lift UInt where lift (UInt i) = litE (intPrimL (I# i)) or do we introduce an unboxed flag and require users to write generic GLift instances using that flag: instance GLift (K1 Unboxed Int) where lift (K1 (Int i)) = litE (intPrimL (I# i)) The former has the advantage that you wouldn't need to change the GLift code to distinguish between (K1 Unboxed Int) and (K1 R Int), which might be a potential source of confusion for programmers. On the other hand, having an Unboxed flag requires only introducing one new data type, as opposed to a separate data type for each of the unlifted types that we want to work over. Ryan S. On Tue, Sep 8, 2015 at 7:59 AM, Andres Loeh wrote: > I don't think there's any fundamental reason why unboxed fields > prevent a Generic instance, as long as we're happy that unboxed values > will be re-boxed in the generic representation. It simply seems as if > nobody has thought of implementing this. As an example, consider the > following hand-written example which works just fine: > > {-# LANGUAGE MagicHash, KindSignatures, PolyKinds, TypeOperators, > TypeFamilies #-} > module GenUnboxed where > > import GHC.Exts > import GHC.Generics > import Generics.Deriving.Eq > > data UPair = UPair Int# Char# > > instance Generic UPair where > type Rep UPair = K1 R Int :*: K1 R Char > from (UPair x y) = K1 (I# x) :*: K1 (C# y) > to (K1 (I# x) :*: K1 (C# y)) = UPair x y > > instance GEq UPair > > test :: Bool > test = let p = UPair 3# 'x'# in geq p p > > Cheers, > Andres > > On Mon, Sep 7, 2015 at 10:02 PM, Ryan Scott wrote: >> Unlifted types can't be used polymorphically or in instance >> declarations, so this makes it impossible to do something like >> >> instance Generic Int# >> >> or store an Int# in one branch of a (:*:), preventing generics from >> doing anything in #-land. (unless someone has found a way to hack >> around this). >> >> I would be okay with implementing a generics-based approach, but we'd >> have to add a caveat that it will only work out-of-the-box on GHC 8.0 >> or later, due to TH's need to look up package information. (We could >> give users the ability to specify a package name manually as a >> workaround.) >> >> If this were added, where would be the best place to put it? th-lift? >> generic-deriving? template-haskell? A new package (lift-generics)? >> >> Ryan S. >> >> On Mon, Sep 7, 2015 at 3:10 PM, Matthew Pickering >> wrote: >>> Continuing my support of the generics route. Is there a fundamental >>> reason why it couldn't handle unlifted types? Given their relative >>> paucity, it seems like a fair compromise to generically define lift >>> instances for all normal data types but require TH for unlifted types. >>> This approach seems much smoother from a maintenance perspective. >>> >>> On Mon, Sep 7, 2015 at 5:26 PM, Ryan Scott wrote: >>>> There is a Lift typeclass defined in template-haskell [1] which, when >>>> a data type is an instance, permits it to be directly used in a TH >>>> quotation, like so >>>> >>>> data Example = Example >>>> >>>> instance Lift Example where >>>> lift Example = conE (mkNameG_d "" "" "Example") >>>> >>>> e :: Example >>>> e = [| Example |] >>>> >>>> Making Lift instances for most data types is straightforward and >>>> mechanical, so the proposal is to allow automatic derivation of Lift >>>> via a -XDeriveLift extension: >>>> >>>> data Example = Example deriving Lift >>>> >>>> This is actually a pretty a pretty old proposal [2], dating back to >>>> 2007. I wanted to have this feature for my needs, so I submitted a >>>> proof-of-concept at the GHC Trac issue page [3]. >>>> >>>> The question now is: do we really want to bake this feature into GHC? >>>> Since not many people opined on the Trac page, I wanted to submit this >>>> here for wider visibility and to have a discussion. >>>> >>>> Here are some arguments I have heard against this feature (please tell >>>> me if I am misrepresenting your opinion): >>>> >>>> * We already have a th-lift package [4] on Hackage which allows >>>> derivation of Lift via Template Haskell functions. In addition, if >>>> you're using Lift, chances are you're also using the -XTemplateHaskell >>>> extension in the first place, so th-lift should be suitable. >>>> * The same functionality could be added via GHC generics (as of GHC >>>> 7.12/8.0, which adds the ability to reify a datatype's package name >>>> [5]), if -XTemplateHaskell can't be used. >>>> * Adding another -XDerive- extension places a burden on GHC devs to >>>> maintain it in the future in response to further Template Haskell >>>> changes. >>>> >>>> Here are my (opinionated) responses to each of these: >>>> >>>> * th-lift isn't as fully-featured as a -XDerive- extension at the >>>> moment, since it can't do sophisticated type inference [6] or derive >>>> for data families. This is something that could be addressed with a >>>> patch to th-lift, though. >>>> * GHC generics wouldn't be enough to handle unlifted types like Int#, >>>> Char#, or Double# (which other -XDerive- extensions do). >>>> * This is a subjective measurement, but in terms of the amount of code >>>> I had to add, -XDeriveLift was substantially simpler than other >>>> -XDerive extensions, because there are fewer weird corner cases. Plus, >>>> I'd volunteer to maintain it :) >>>> >>>> Simon PJ wanted to know if other Template Haskell programmers would >>>> find -XDeriveLift useful. Would you be able to use it? Would you like >>>> to see a solution other than putting it into GHC? I'd love to hear >>>> feedback so we can bring some closure to this 8-year-old feature >>>> request. >>>> >>>> Ryan S. >>>> >>>> ----- >>>> [1] http://hackage.haskell.org/package/template-haskell-2.10.0.0/docs/Language-Haskell-TH-Syntax.html#t:Lift >>>> [2] https://mail.haskell.org/pipermail/template-haskell/2007-October/000635.html >>>> [3] https://ghc.haskell.org/trac/ghc/ticket/1830 >>>> [4] http://hackage.haskell.org/package/th-lift >>>> [5] https://ghc.haskell.org/trac/ghc/ticket/10030 >>>> [6] https://ghc.haskell.org/trac/ghc/ticket/1830#comment:11 >>>> _______________________________________________ >>>> ghc-devs mailing list >>>> ghc-devs at haskell.org >>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> _______________________________________________ >> ghc-devs mailing list >> ghc-devs at haskell.org >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From eir at cis.upenn.edu Wed Sep 9 01:37:18 2015 From: eir at cis.upenn.edu (Richard Eisenberg) Date: Tue, 8 Sep 2015 21:37:18 -0400 Subject: Haskell Error Messages In-Reply-To: References: Message-ID: <28F3A3E0-3209-4DB6-8455-E6AC740C7E2C@cis.upenn.edu> Ticket #8809 (https://ghc.haskell.org/trac/ghc/ticket/8809) seems the best spot to look for this. Richard On Sep 8, 2015, at 2:49 PM, "Alan & Kim Zimmerman" wrote: > Is there currently any planned work around making the haskell error messages able to support something like the ones in IDRIS, as shown in David Christianson's talk "A Pretty printer that says what it means" at HIW? > > https://www.youtube.com/watch?v=m7BBCcIDXSg&list=PLnqUlCo055hVfNkQHP7z43r10yNo-mc7B&index=10 > > Alan > > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs -------------- next part -------------- An HTML attachment was scrubbed... URL: From dan.doel at gmail.com Wed Sep 9 02:43:55 2015 From: dan.doel at gmail.com (Dan Doel) Date: Tue, 8 Sep 2015 22:43:55 -0400 Subject: Unlifted data types In-Reply-To: References: <1441353701-sup-9422@sabre> <6707b31c94d44af89ba2a90580ac46ce@DB4PR30MB030.064d.mgd.msft.net> <6e2bcecf1a284c62a656e80992e9862e@DB4PR30MB030.064d.mgd.msft.net> Message-ID: On Tue, Sep 8, 2015 at 3:52 AM, Simon Peyton Jones wrote: > | And to > | be honest, I'm not sure we need arbitrary data types in Unlifted; > | Force (which would be primitive) might be enough. > > That's an interesting thought. But presumably you'd have to use 'suspend' (a terrible name) a lot: > > type StrictList a = Force (StrictList' a) > data StrictList' a = Nil | Cons !a (StrictList a) > > mapStrict :: (a -> b) -> StrictList a -> StrictList b > mapStrict f xs = mapStrict' f (suspend xs) > > mapStrict' :: (a