Proposal: Partitionable goes somewhere + containers instances

Ryan Newton rrnewton at gmail.com
Mon Sep 30 03:06:05 UTC 2013


Thanks Edward.  Good point about Brent's 'split' package.  That would be a
really nice place to put the class.  But it doesn't currently depend on
containers or vector so I suppose the other instances would need to go
somewhere else.  (Assuming containers only exported monomorphic versions.)

Maybe a next step would be proposing some monomorphic variants for the
containers package.

I think the complicated bit will be describing how "best-efforty" splitting
variants are:

   - Is it guaranteed O(1) time and allocation?
   - Is the provided Int an upper bound?  Lower(ish) bound?  Or just a hint?

With some data structures, there will be a trade-off between partition
imbalance and the work required to achieve balance.  But with some data
structures it is happily not a problem (e.g. Vector)!

But whether there's one variant or a few, I'd be happy either way, as long
as I get at least the cheap one (i.e. prefer imbalance to restructuring).

  -Ryan




On Sun, Sep 29, 2013 at 8:20 AM, Edward Kmett <ekmett at gmail.com> wrote:

> I don't know that it belongs in the "standard" libraries, but there could
> definitely be a package for something similar.
>
> ConstraintKinds are a pretty hefty extension to throw at it, and the
> signature written there prevents it from being used on ByteString, Text,
> etc.
>
> This can be implemented with much lighter weight types though!
>
>
> class Partitionable t where
>
>     partition :: Int -> t -> [t]
>
>
> Now ByteString, Text etc. can be instances and no real flexibility is
> lost, as with the class associated constraint on the argument, you'd
> already given up polymorphic recursion.
>
> There still remain issues. partition is already established as the filterthat returns both the matching and unmatching elements, so the name is
> wrong.
>
> This is a generalization of Data.List.splitEvery, perhaps it is worth
> seeing how many others can be generalized similarly and talk to Brent about
> adding, say, a Data.Split module to his split package in the platform?
>
> -Edward
>
>
>
>
>
> On Sun, Sep 29, 2013 at 4:21 AM, Ryan Newton <rrnewton at gmail.com> wrote:
>
>> <subject change>
>>
>> On Sun, Sep 29, 2013 at 3:31 AM, Mike Izbicki <mike at izbicki.me> wrote:
>>
>>> I've got a Partitionable class that I've been using for this purpose:
>>>
>>> https://github.com/mikeizbicki/ConstraintKinds/blob/master/src/Control/ConstraintKinds/Partitionable.hs
>>>
>>
>> Mike -- Neat, that's a cool library!
>>
>> Edward --  ideally, where in the standard libraries should the
>> Partitionable comonoid go?
>>
>> Btw, I'm not sure what the ideal return type for comappend is, given that
>> it needs to be able to "bottom out".  Mike, our partition function's list
>> return type seems more reasonable.  Or maybe something simple would be this:
>>
>> *class Partitionable t where*
>> *  partition :: t -> Maybe (t,t)*
>>
>> That is, at some point its not worth splitting and returns Nothing, and
>> you'd better be able to deal with the 't' directly.
>>
>> So what I really want is for the *containers package to please get some
>> kind of Partitionable instances! * Johan & others, I would be happy to
>> provide a patch if the class can be agreed on. This is important because
>> currently the balanced tree structure of Data.Set/Map is an *amazing and
>> beneficial property* that is *not* exposed at all through the API.
>>    For example, it would be great to have a parallel traverse_ for Maps
>> and Sets in the Par monad.  The particular impetus is that our new and
>> enhanced Par monad makes extensive use of Maps and Sets, both the pure,
>> balanced ones, and lockfree/inplace ones based on concurrent skip lists:
>>
>>     http://www.cs.indiana.edu/~rrnewton/haddock/lvish/
>>
>> Alternatively, it would be ok if there were a "Data.Map.Internal" module
>> that exposed the Bin/Tip, but I assume people would rather have a clean
>> Partitionable instance...
>>
>> Best,
>>   -Ryan
>>
>>
>> On Sun, Sep 29, 2013 at 3:31 AM, Mike Izbicki <mike at izbicki.me> wrote:
>>
>>> I've got a Partitionable class that I've been using for this purpose:
>>>
>>>
>>> https://github.com/mikeizbicki/ConstraintKinds/blob/master/src/Control/ConstraintKinds/Partitionable.hs
>>>
>>> The function called "parallel" in the HLearn library will automatically
>>> parallelize any homomorphism from a Partionable to a Monoid.  I
>>> specifically use that to parallelize machine learning algorithms.
>>>
>>> I have two thoughts for better abstractions:
>>>
>>> 1)  This Partitionable class is essentially a comonoid.  By reversing
>>> the arrows of mappend, we get:
>>>
>>> comappend :: a -> (a,a)
>>>
>>> By itself, this works well if the number of processors you have is a
>>> power of two, but it needs some more fanciness to get things balanced
>>> properly for other numbers of processors.  I bet there's another algebraic
>>> structure that would capture these other cases, but I'm not sure what it is.
>>>
>>> 2) I'm working with parallelizing tree structures right now (kd-trees,
>>> cover trees, oct-trees, etc.).  The real problem is not splitting the
>>> number of data points equally (this is easy), but splitting the amount of
>>> work equally.  Some points take longer to process than others, and this
>>> cannot be determined in advance.  Therefore, an equal split of the data
>>> points can result in one processor getting 25% of the work load, and the
>>> second processor getting 75%.  Some sort of lazy Partitionable class that
>>> was aware of processor loads and didn't split data points until they were
>>> needed would be ideal for this scenario.
>>>
>>> On Sat, Sep 28, 2013 at 6:46 PM, adam vogt <vogt.adam at gmail.com> wrote:
>>>
>>>> On Sat, Sep 28, 2013 at 1:09 PM, Ryan Newton <rrnewton at gmail.com>
>>>> wrote:
>>>> > Hi all,
>>>> >
>>>> > We all know and love Data.Foldable and are familiar with left folds
>>>> and
>>>> > right folds.  But what you want in a parallel program is a balanced
>>>> fold
>>>> > over a tree.  Fortunately, many of our datatypes (Sets, Maps)
>>>> actually ARE
>>>> > balanced trees.  Hmm, but how do we expose that?
>>>>
>>>> Hi Ryan,
>>>>
>>>> At least for Data.Map, the Foldable instance seems to have a
>>>> reasonably balanced fold called fold (or foldMap):
>>>>
>>>> >  fold t = go t
>>>> >    where   go (Bin _ _ v l r) = go l `mappend` (v `mappend` go r)
>>>>
>>>> This doesn't seem to be guaranteed though. For example ghc's derived
>>>> instance writes the foldr only, so fold would be right-associated for
>>>> a:
>>>>
>>>> > data T a = B (T a) (T a) | L a deriving (Foldable)
>>>>
>>>> Regards,
>>>> Adam
>>>> _______________________________________________
>>>> Haskell-Cafe mailing list
>>>> Haskell-Cafe at haskell.org
>>>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20130929/3c0f5875/attachment.html>


More information about the Libraries mailing list