Proposal: Partitionable goes somewhere + containers instances

John Lato jwlato
Mon Oct 7 04:27:45 UTC 2013


I'm in favor of Option 1 (exposing Bin/Tip) on general principles.


On Sun, Oct 6, 2013 at 11:16 PM, Ryan Newton <rrnewton at gmail.com> wrote:

> Ok, so we've narrowed the focus quite a bit to JUST exposing enough from
> containers to enable a third-party library to do all the parallel
> traversals it wants.  Which of the following limited proposal would people
> like more?
>
> (1) Expose Bin/Tip from, say, Data.Map.Internal, as in this patch:
>
> https://github.com/rrnewton/containers/commit/5d6b07f69e8396023101039a4aaab619af41c810
>
> (2) a splitTree function [1].  A patch can be found here:
> https://github.com/rrnewton/containers/commit/6153896f0c7e6cdf70656dc6b641ce61711175f8
>
> The argument for (1) would be that it doesn't pollute any namespaces
> people actually use at all, and Tip & Bin would seem to be pretty darn
> stable at this point.  The only consumers of this information in practice
> would be downstream companion libraries (like, say, a parallel traversals
> library for monad-par & LVish!)  Those could be updated if there were ever
> a seismic shift in the containers implementations.
>
> Finally, I'd like to propose October 14th as a date to close discussion.
>
> Best,
>    -Ryan
>
> [1]  Here's the relevant definition:
>
> *-- | /O(1)/.  Decompose a Map into pieces.  No guarantee is made as to
> the sizes of*
> *-- the pieces, but some of them will be balanced, and some may be empty.*
> *splitTree :: Map k b -> Maybe (Map k b, Map k b, Map k b)*
> *splitTree orig =*
> *  case orig of *
> *    Tip           -> Nothing*
> *    Bin 1 k v l r -> Just (singleton k v, l, r)*
> *{-# INLINE splitTree #-}*
>
> You could argue for returning a list instead, but I'm betting they'll be
> harder to deforest.  (I'll have to test it.) And the above should be
> emulatable by any future implementation...
>
> On Mon, Sep 30, 2013 at 11:28 AM, Ryan Newton <rrnewton at gmail.com> wrote:
>
>> Edward,
>>
>> The problem is that I need *something* more from the containers library
>> to be able to construct this as a separate library.  I don't think I can
>> use foldMap to implement a Splittable/Partitionable instance for Data.Set,
>> namely because I specifically want to do O(1) work instead of any kind of
>> full traversal of the structure.
>>
>> Is the least possible disruption here to just have a Data.Map.Internal
>> that exposes Tip and Bin?  It can be marked with suitable warnings at the
>> top of the module.
>>
>> Or would the preference to be to expose something more abstract of type
>> "Map k a -> [Map k a]" that chops it into the "natural pieces"? [1]
>>
>>   -Ryan
>>
>> [1] Btw, it seems like returning a tuple here might make deforestation
>> more likely than returning a list... right?
>>
>>
>> On Mon, Sep 30, 2013 at 9:52 AM, Edward Kmett <ekmett at gmail.com> wrote:
>>
>>> Upon consideration from a package management perspective this is
>>> probably easiest done by building a new small package to provide the
>>> functionality you want. That way we don't haphazardly change the transitive
>>> dependencies of a big chunk of the ecosystem and it can rest atop the
>>> various containers libraries. This also gives you a lot of opportunity to
>>> iterate on the API in public without incurring the instant rigidity of the
>>> Haskell Platform.
>>>
>>>
>>> On Sun, Sep 29, 2013 at 11:06 PM, Ryan Newton <rrnewton at gmail.com>wrote:
>>>
>>>> Thanks Edward.  Good point about Brent's 'split' package.  That would
>>>> be a really nice place to put the class.  But it doesn't currently depend
>>>> on containers or vector so I suppose the other instances would need to go
>>>> somewhere else.  (Assuming containers only exported monomorphic versions.)
>>>>
>>>> Maybe a next step would be proposing some monomorphic variants for the
>>>> containers package.
>>>>
>>>> I think the complicated bit will be describing how "best-efforty"
>>>> splitting variants are:
>>>>
>>>>    - Is it guaranteed O(1) time and allocation?
>>>>    - Is the provided Int an upper bound?  Lower(ish) bound?  Or just a
>>>>    hint?
>>>>
>>>> With some data structures, there will be a trade-off between partition
>>>> imbalance and the work required to achieve balance.  But with some data
>>>> structures it is happily not a problem (e.g. Vector)!
>>>>
>>>> But whether there's one variant or a few, I'd be happy either way, as
>>>> long as I get at least the cheap one (i.e. prefer imbalance to
>>>> restructuring).
>>>>
>>>>   -Ryan
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Sep 29, 2013 at 8:20 AM, Edward Kmett <ekmett at gmail.com> wrote:
>>>>
>>>>> I don't know that it belongs in the "standard" libraries, but there
>>>>> could definitely be a package for something similar.
>>>>>
>>>>> ConstraintKinds are a pretty hefty extension to throw at it, and the
>>>>> signature written there prevents it from being used on ByteString, Text,
>>>>> etc.
>>>>>
>>>>> This can be implemented with much lighter weight types though!
>>>>>
>>>>>
>>>>> class Partitionable t where
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>     partition :: Int -> t -> [t]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Now ByteString, Text etc. can be instances and no real flexibility is
>>>>> lost, as with the class associated constraint on the argument, you'd
>>>>> already given up polymorphic recursion.
>>>>>
>>>>> There still remain issues. partition is already established as the
>>>>> filter that returns both the matching and unmatching elements, so the
>>>>> name is wrong.
>>>>>
>>>>> This is a generalization of Data.List.splitEvery, perhaps it is worth
>>>>> seeing how many others can be generalized similarly and talk to Brent about
>>>>> adding, say, a Data.Split module to his split package in the platform?
>>>>>
>>>>> -Edward
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Sep 29, 2013 at 4:21 AM, Ryan Newton <rrnewton at gmail.com>wrote:
>>>>>
>>>>>> <subject change>
>>>>>>
>>>>>> On Sun, Sep 29, 2013 at 3:31 AM, Mike Izbicki <mike at izbicki.me>
>>>>>>  wrote:
>>>>>>
>>>>>>> I've got a Partitionable class that I've been using for this purpose:
>>>>>>>
>>>>>>> https://github.com/mikeizbicki/ConstraintKinds/blob/master/src/Control/ConstraintKinds/Partitionable.hs
>>>>>>>
>>>>>>
>>>>>> Mike -- Neat, that's a cool library!
>>>>>>
>>>>>> Edward --  ideally, where in the standard libraries should the
>>>>>> Partitionable comonoid go?
>>>>>>
>>>>>> Btw, I'm not sure what the ideal return type for comappend is, given
>>>>>> that it needs to be able to "bottom out".  Mike, our partition function's
>>>>>> list return type seems more reasonable.  Or maybe something simple would be
>>>>>> this:
>>>>>>
>>>>>> *class Partitionable t where*
>>>>>> *  partition :: t -> Maybe (t,t)*
>>>>>>
>>>>>> That is, at some point its not worth splitting and returns Nothing,
>>>>>> and you'd better be able to deal with the 't' directly.
>>>>>>
>>>>>> So what I really want is for the *containers package to please get
>>>>>> some kind of Partitionable instances! * Johan & others, I would be
>>>>>> happy to provide a patch if the class can be agreed on. This is important
>>>>>> because currently the balanced tree structure of Data.Set/Map is an *amazing
>>>>>> and beneficial property* that is *not* exposed at all through the
>>>>>> API.
>>>>>>    For example, it would be great to have a parallel traverse_ for
>>>>>> Maps and Sets in the Par monad.  The particular impetus is that our
>>>>>> new and enhanced Par monad makes extensive use of Maps and Sets, both the
>>>>>> pure, balanced ones, and lockfree/inplace ones based on concurrent skip
>>>>>> lists:
>>>>>>
>>>>>>     http://www.cs.indiana.edu/~rrnewton/haddock/lvish/
>>>>>>
>>>>>> Alternatively, it would be ok if there were a "Data.Map.Internal"
>>>>>> module that exposed the Bin/Tip, but I assume people would rather have a
>>>>>> clean Partitionable instance...
>>>>>>
>>>>>> Best,
>>>>>>   -Ryan
>>>>>>
>>>>>>
>>>>>> On Sun, Sep 29, 2013 at 3:31 AM, Mike Izbicki <mike at izbicki.me>wrote:
>>>>>>
>>>>>>> I've got a Partitionable class that I've been using for this purpose:
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/mikeizbicki/ConstraintKinds/blob/master/src/Control/ConstraintKinds/Partitionable.hs
>>>>>>>
>>>>>>> The function called "parallel" in the HLearn library will
>>>>>>> automatically parallelize any homomorphism from a Partionable to a Monoid.
>>>>>>> I specifically use that to parallelize machine learning algorithms.
>>>>>>>
>>>>>>> I have two thoughts for better abstractions:
>>>>>>>
>>>>>>> 1)  This Partitionable class is essentially a comonoid.  By
>>>>>>> reversing the arrows of mappend, we get:
>>>>>>>
>>>>>>> comappend :: a -> (a,a)
>>>>>>>
>>>>>>> By itself, this works well if the number of processors you have is a
>>>>>>> power of two, but it needs some more fanciness to get things balanced
>>>>>>> properly for other numbers of processors.  I bet there's another algebraic
>>>>>>> structure that would capture these other cases, but I'm not sure what it is.
>>>>>>>
>>>>>>> 2) I'm working with parallelizing tree structures right now
>>>>>>> (kd-trees, cover trees, oct-trees, etc.).  The real problem is not
>>>>>>> splitting the number of data points equally (this is easy), but splitting
>>>>>>> the amount of work equally.  Some points take longer to process than
>>>>>>> others, and this cannot be determined in advance.  Therefore, an equal
>>>>>>> split of the data points can result in one processor getting 25% of the
>>>>>>> work load, and the second processor getting 75%.  Some sort of lazy
>>>>>>> Partitionable class that was aware of processor loads and didn't split data
>>>>>>> points until they were needed would be ideal for this scenario.
>>>>>>>
>>>>>>> On Sat, Sep 28, 2013 at 6:46 PM, adam vogt <vogt.adam at gmail.com>wrote:
>>>>>>>
>>>>>>>> On Sat, Sep 28, 2013 at 1:09 PM, Ryan Newton <rrnewton at gmail.com>
>>>>>>>> wrote:
>>>>>>>> > Hi all,
>>>>>>>> >
>>>>>>>> > We all know and love Data.Foldable and are familiar with left
>>>>>>>> folds and
>>>>>>>> > right folds.  But what you want in a parallel program is a
>>>>>>>> balanced fold
>>>>>>>> > over a tree.  Fortunately, many of our datatypes (Sets, Maps)
>>>>>>>> actually ARE
>>>>>>>> > balanced trees.  Hmm, but how do we expose that?
>>>>>>>>
>>>>>>>> Hi Ryan,
>>>>>>>>
>>>>>>>> At least for Data.Map, the Foldable instance seems to have a
>>>>>>>> reasonably balanced fold called fold (or foldMap):
>>>>>>>>
>>>>>>>> >  fold t = go t
>>>>>>>> >    where   go (Bin _ _ v l r) = go l `mappend` (v `mappend` go r)
>>>>>>>>
>>>>>>>> This doesn't seem to be guaranteed though. For example ghc's derived
>>>>>>>> instance writes the foldr only, so fold would be right-associated
>>>>>>>> for
>>>>>>>> a:
>>>>>>>>
>>>>>>>> > data T a = B (T a) (T a) | L a deriving (Foldable)
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Adam
>>>>>>>> _______________________________________________
>>>>>>>> Haskell-Cafe mailing list
>>>>>>>> Haskell-Cafe at haskell.org
>>>>>>>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://www.haskell.org/mailman/listinfo/libraries
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20131006/fb37b5f3/attachment-0001.html>




More information about the Libraries mailing list