Proposal: Partitionable goes somewhere + containers instances
Edward Kmett
ekmett at gmail.com
Mon Sep 30 13:52:58 UTC 2013
Upon consideration from a package management perspective this is probably
easiest done by building a new small package to provide the functionality
you want. That way we don't haphazardly change the transitive dependencies
of a big chunk of the ecosystem and it can rest atop the various containers
libraries. This also gives you a lot of opportunity to iterate on the API
in public without incurring the instant rigidity of the Haskell Platform.
On Sun, Sep 29, 2013 at 11:06 PM, Ryan Newton <rrnewton at gmail.com> wrote:
> Thanks Edward. Good point about Brent's 'split' package. That would be a
> really nice place to put the class. But it doesn't currently depend on
> containers or vector so I suppose the other instances would need to go
> somewhere else. (Assuming containers only exported monomorphic versions.)
>
> Maybe a next step would be proposing some monomorphic variants for the
> containers package.
>
> I think the complicated bit will be describing how "best-efforty"
> splitting variants are:
>
> - Is it guaranteed O(1) time and allocation?
> - Is the provided Int an upper bound? Lower(ish) bound? Or just a
> hint?
>
> With some data structures, there will be a trade-off between partition
> imbalance and the work required to achieve balance. But with some data
> structures it is happily not a problem (e.g. Vector)!
>
> But whether there's one variant or a few, I'd be happy either way, as long
> as I get at least the cheap one (i.e. prefer imbalance to restructuring).
>
> -Ryan
>
>
>
>
> On Sun, Sep 29, 2013 at 8:20 AM, Edward Kmett <ekmett at gmail.com> wrote:
>
>> I don't know that it belongs in the "standard" libraries, but there could
>> definitely be a package for something similar.
>>
>> ConstraintKinds are a pretty hefty extension to throw at it, and the
>> signature written there prevents it from being used on ByteString, Text,
>> etc.
>>
>> This can be implemented with much lighter weight types though!
>>
>>
>> class Partitionable t where
>>
>>
>> partition :: Int -> t -> [t]
>>
>>
>>
>> Now ByteString, Text etc. can be instances and no real flexibility is
>> lost, as with the class associated constraint on the argument, you'd
>> already given up polymorphic recursion.
>>
>> There still remain issues. partition is already established as the filterthat returns both the matching and unmatching elements, so the name is
>> wrong.
>>
>> This is a generalization of Data.List.splitEvery, perhaps it is worth
>> seeing how many others can be generalized similarly and talk to Brent about
>> adding, say, a Data.Split module to his split package in the platform?
>>
>> -Edward
>>
>>
>>
>>
>>
>> On Sun, Sep 29, 2013 at 4:21 AM, Ryan Newton <rrnewton at gmail.com> wrote:
>>
>>> <subject change>
>>>
>>> On Sun, Sep 29, 2013 at 3:31 AM, Mike Izbicki <mike at izbicki.me> wrote:
>>>
>>>> I've got a Partitionable class that I've been using for this purpose:
>>>>
>>>> https://github.com/mikeizbicki/ConstraintKinds/blob/master/src/Control/ConstraintKinds/Partitionable.hs
>>>>
>>>
>>> Mike -- Neat, that's a cool library!
>>>
>>> Edward -- ideally, where in the standard libraries should the
>>> Partitionable comonoid go?
>>>
>>> Btw, I'm not sure what the ideal return type for comappend is, given
>>> that it needs to be able to "bottom out". Mike, our partition function's
>>> list return type seems more reasonable. Or maybe something simple would be
>>> this:
>>>
>>> *class Partitionable t where*
>>> * partition :: t -> Maybe (t,t)*
>>>
>>> That is, at some point its not worth splitting and returns Nothing, and
>>> you'd better be able to deal with the 't' directly.
>>>
>>> So what I really want is for the *containers package to please get some
>>> kind of Partitionable instances! * Johan & others, I would be happy to
>>> provide a patch if the class can be agreed on. This is important because
>>> currently the balanced tree structure of Data.Set/Map is an *amazing
>>> and beneficial property* that is *not* exposed at all through the API.
>>> For example, it would be great to have a parallel traverse_ for Maps
>>> and Sets in the Par monad. The particular impetus is that our new and
>>> enhanced Par monad makes extensive use of Maps and Sets, both the pure,
>>> balanced ones, and lockfree/inplace ones based on concurrent skip lists:
>>>
>>> http://www.cs.indiana.edu/~rrnewton/haddock/lvish/
>>>
>>> Alternatively, it would be ok if there were a "Data.Map.Internal" module
>>> that exposed the Bin/Tip, but I assume people would rather have a clean
>>> Partitionable instance...
>>>
>>> Best,
>>> -Ryan
>>>
>>>
>>> On Sun, Sep 29, 2013 at 3:31 AM, Mike Izbicki <mike at izbicki.me> wrote:
>>>
>>>> I've got a Partitionable class that I've been using for this purpose:
>>>>
>>>>
>>>> https://github.com/mikeizbicki/ConstraintKinds/blob/master/src/Control/ConstraintKinds/Partitionable.hs
>>>>
>>>> The function called "parallel" in the HLearn library will automatically
>>>> parallelize any homomorphism from a Partionable to a Monoid. I
>>>> specifically use that to parallelize machine learning algorithms.
>>>>
>>>> I have two thoughts for better abstractions:
>>>>
>>>> 1) This Partitionable class is essentially a comonoid. By reversing
>>>> the arrows of mappend, we get:
>>>>
>>>> comappend :: a -> (a,a)
>>>>
>>>> By itself, this works well if the number of processors you have is a
>>>> power of two, but it needs some more fanciness to get things balanced
>>>> properly for other numbers of processors. I bet there's another algebraic
>>>> structure that would capture these other cases, but I'm not sure what it is.
>>>>
>>>> 2) I'm working with parallelizing tree structures right now (kd-trees,
>>>> cover trees, oct-trees, etc.). The real problem is not splitting the
>>>> number of data points equally (this is easy), but splitting the amount of
>>>> work equally. Some points take longer to process than others, and this
>>>> cannot be determined in advance. Therefore, an equal split of the data
>>>> points can result in one processor getting 25% of the work load, and the
>>>> second processor getting 75%. Some sort of lazy Partitionable class that
>>>> was aware of processor loads and didn't split data points until they were
>>>> needed would be ideal for this scenario.
>>>>
>>>> On Sat, Sep 28, 2013 at 6:46 PM, adam vogt <vogt.adam at gmail.com> wrote:
>>>>
>>>>> On Sat, Sep 28, 2013 at 1:09 PM, Ryan Newton <rrnewton at gmail.com>
>>>>> wrote:
>>>>> > Hi all,
>>>>> >
>>>>> > We all know and love Data.Foldable and are familiar with left folds
>>>>> and
>>>>> > right folds. But what you want in a parallel program is a balanced
>>>>> fold
>>>>> > over a tree. Fortunately, many of our datatypes (Sets, Maps)
>>>>> actually ARE
>>>>> > balanced trees. Hmm, but how do we expose that?
>>>>>
>>>>> Hi Ryan,
>>>>>
>>>>> At least for Data.Map, the Foldable instance seems to have a
>>>>> reasonably balanced fold called fold (or foldMap):
>>>>>
>>>>> > fold t = go t
>>>>> > where go (Bin _ _ v l r) = go l `mappend` (v `mappend` go r)
>>>>>
>>>>> This doesn't seem to be guaranteed though. For example ghc's derived
>>>>> instance writes the foldr only, so fold would be right-associated for
>>>>> a:
>>>>>
>>>>> > data T a = B (T a) (T a) | L a deriving (Foldable)
>>>>>
>>>>> Regards,
>>>>> Adam
>>>>> _______________________________________________
>>>>> Haskell-Cafe mailing list
>>>>> Haskell-Cafe at haskell.org
>>>>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20130930/aaaff35a/attachment.html>
More information about the Libraries
mailing list