Proposal: Partitionable goes somewhere + containers instances
Ryan Newton
rrnewton
Mon Oct 7 04:16:40 UTC 2013
Ok, so we've narrowed the focus quite a bit to JUST exposing enough from
containers to enable a third-party library to do all the parallel
traversals it wants. Which of the following limited proposal would people
like more?
(1) Expose Bin/Tip from, say, Data.Map.Internal, as in this patch:
https://github.com/rrnewton/containers/commit/5d6b07f69e8396023101039a4aaab619af41c810
(2) a splitTree function [1]. A patch can be found here:
https://github.com/rrnewton/containers/commit/6153896f0c7e6cdf70656dc6b641ce61711175f8
The argument for (1) would be that it doesn't pollute any namespaces people
actually use at all, and Tip & Bin would seem to be pretty darn stable at
this point. The only consumers of this information in practice would be
downstream companion libraries (like, say, a parallel traversals library
for monad-par & LVish!) Those could be updated if there were ever a
seismic shift in the containers implementations.
Finally, I'd like to propose October 14th as a date to close discussion.
Best,
-Ryan
[1] Here's the relevant definition:
*-- | /O(1)/. Decompose a Map into pieces. No guarantee is made as to the
sizes of*
*-- the pieces, but some of them will be balanced, and some may be empty.*
*splitTree :: Map k b -> Maybe (Map k b, Map k b, Map k b)*
*splitTree orig =*
* case orig of *
* Tip -> Nothing*
* Bin 1 k v l r -> Just (singleton k v, l, r)*
*{-# INLINE splitTree #-}*
You could argue for returning a list instead, but I'm betting they'll be
harder to deforest. (I'll have to test it.) And the above should be
emulatable by any future implementation...
On Mon, Sep 30, 2013 at 11:28 AM, Ryan Newton <rrnewton at gmail.com> wrote:
> Edward,
>
> The problem is that I need *something* more from the containers library to
> be able to construct this as a separate library. I don't think I can use
> foldMap to implement a Splittable/Partitionable instance for Data.Set,
> namely because I specifically want to do O(1) work instead of any kind of
> full traversal of the structure.
>
> Is the least possible disruption here to just have a Data.Map.Internal
> that exposes Tip and Bin? It can be marked with suitable warnings at the
> top of the module.
>
> Or would the preference to be to expose something more abstract of type
> "Map k a -> [Map k a]" that chops it into the "natural pieces"? [1]
>
> -Ryan
>
> [1] Btw, it seems like returning a tuple here might make deforestation
> more likely than returning a list... right?
>
>
> On Mon, Sep 30, 2013 at 9:52 AM, Edward Kmett <ekmett at gmail.com> wrote:
>
>> Upon consideration from a package management perspective this is probably
>> easiest done by building a new small package to provide the functionality
>> you want. That way we don't haphazardly change the transitive dependencies
>> of a big chunk of the ecosystem and it can rest atop the various containers
>> libraries. This also gives you a lot of opportunity to iterate on the API
>> in public without incurring the instant rigidity of the Haskell Platform.
>>
>>
>> On Sun, Sep 29, 2013 at 11:06 PM, Ryan Newton <rrnewton at gmail.com> wrote:
>>
>>> Thanks Edward. Good point about Brent's 'split' package. That would be
>>> a really nice place to put the class. But it doesn't currently depend on
>>> containers or vector so I suppose the other instances would need to go
>>> somewhere else. (Assuming containers only exported monomorphic versions.)
>>>
>>> Maybe a next step would be proposing some monomorphic variants for the
>>> containers package.
>>>
>>> I think the complicated bit will be describing how "best-efforty"
>>> splitting variants are:
>>>
>>> - Is it guaranteed O(1) time and allocation?
>>> - Is the provided Int an upper bound? Lower(ish) bound? Or just a
>>> hint?
>>>
>>> With some data structures, there will be a trade-off between partition
>>> imbalance and the work required to achieve balance. But with some data
>>> structures it is happily not a problem (e.g. Vector)!
>>>
>>> But whether there's one variant or a few, I'd be happy either way, as
>>> long as I get at least the cheap one (i.e. prefer imbalance to
>>> restructuring).
>>>
>>> -Ryan
>>>
>>>
>>>
>>>
>>> On Sun, Sep 29, 2013 at 8:20 AM, Edward Kmett <ekmett at gmail.com> wrote:
>>>
>>>> I don't know that it belongs in the "standard" libraries, but there
>>>> could definitely be a package for something similar.
>>>>
>>>> ConstraintKinds are a pretty hefty extension to throw at it, and the
>>>> signature written there prevents it from being used on ByteString, Text,
>>>> etc.
>>>>
>>>> This can be implemented with much lighter weight types though!
>>>>
>>>>
>>>> class Partitionable t where
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> partition :: Int -> t -> [t]
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Now ByteString, Text etc. can be instances and no real flexibility is
>>>> lost, as with the class associated constraint on the argument, you'd
>>>> already given up polymorphic recursion.
>>>>
>>>> There still remain issues. partition is already established as the
>>>> filter that returns both the matching and unmatching elements, so the
>>>> name is wrong.
>>>>
>>>> This is a generalization of Data.List.splitEvery, perhaps it is worth
>>>> seeing how many others can be generalized similarly and talk to Brent about
>>>> adding, say, a Data.Split module to his split package in the platform?
>>>>
>>>> -Edward
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Sep 29, 2013 at 4:21 AM, Ryan Newton <rrnewton at gmail.com>wrote:
>>>>
>>>>> <subject change>
>>>>>
>>>>> On Sun, Sep 29, 2013 at 3:31 AM, Mike Izbicki <mike at izbicki.me> wrote:
>>>>>
>>>>>> I've got a Partitionable class that I've been using for this purpose:
>>>>>>
>>>>>> https://github.com/mikeizbicki/ConstraintKinds/blob/master/src/Control/ConstraintKinds/Partitionable.hs
>>>>>>
>>>>>
>>>>> Mike -- Neat, that's a cool library!
>>>>>
>>>>> Edward -- ideally, where in the standard libraries should the
>>>>> Partitionable comonoid go?
>>>>>
>>>>> Btw, I'm not sure what the ideal return type for comappend is, given
>>>>> that it needs to be able to "bottom out". Mike, our partition function's
>>>>> list return type seems more reasonable. Or maybe something simple would be
>>>>> this:
>>>>>
>>>>> *class Partitionable t where*
>>>>> * partition :: t -> Maybe (t,t)*
>>>>>
>>>>> That is, at some point its not worth splitting and returns Nothing,
>>>>> and you'd better be able to deal with the 't' directly.
>>>>>
>>>>> So what I really want is for the *containers package to please get
>>>>> some kind of Partitionable instances! * Johan & others, I would be
>>>>> happy to provide a patch if the class can be agreed on. This is important
>>>>> because currently the balanced tree structure of Data.Set/Map is an *amazing
>>>>> and beneficial property* that is *not* exposed at all through the
>>>>> API.
>>>>> For example, it would be great to have a parallel traverse_ for
>>>>> Maps and Sets in the Par monad. The particular impetus is that our
>>>>> new and enhanced Par monad makes extensive use of Maps and Sets, both the
>>>>> pure, balanced ones, and lockfree/inplace ones based on concurrent skip
>>>>> lists:
>>>>>
>>>>> http://www.cs.indiana.edu/~rrnewton/haddock/lvish/
>>>>>
>>>>> Alternatively, it would be ok if there were a "Data.Map.Internal"
>>>>> module that exposed the Bin/Tip, but I assume people would rather have a
>>>>> clean Partitionable instance...
>>>>>
>>>>> Best,
>>>>> -Ryan
>>>>>
>>>>>
>>>>> On Sun, Sep 29, 2013 at 3:31 AM, Mike Izbicki <mike at izbicki.me> wrote:
>>>>>
>>>>>> I've got a Partitionable class that I've been using for this purpose:
>>>>>>
>>>>>>
>>>>>> https://github.com/mikeizbicki/ConstraintKinds/blob/master/src/Control/ConstraintKinds/Partitionable.hs
>>>>>>
>>>>>> The function called "parallel" in the HLearn library will
>>>>>> automatically parallelize any homomorphism from a Partionable to a Monoid.
>>>>>> I specifically use that to parallelize machine learning algorithms.
>>>>>>
>>>>>> I have two thoughts for better abstractions:
>>>>>>
>>>>>> 1) This Partitionable class is essentially a comonoid. By reversing
>>>>>> the arrows of mappend, we get:
>>>>>>
>>>>>> comappend :: a -> (a,a)
>>>>>>
>>>>>> By itself, this works well if the number of processors you have is a
>>>>>> power of two, but it needs some more fanciness to get things balanced
>>>>>> properly for other numbers of processors. I bet there's another algebraic
>>>>>> structure that would capture these other cases, but I'm not sure what it is.
>>>>>>
>>>>>> 2) I'm working with parallelizing tree structures right now
>>>>>> (kd-trees, cover trees, oct-trees, etc.). The real problem is not
>>>>>> splitting the number of data points equally (this is easy), but splitting
>>>>>> the amount of work equally. Some points take longer to process than
>>>>>> others, and this cannot be determined in advance. Therefore, an equal
>>>>>> split of the data points can result in one processor getting 25% of the
>>>>>> work load, and the second processor getting 75%. Some sort of lazy
>>>>>> Partitionable class that was aware of processor loads and didn't split data
>>>>>> points until they were needed would be ideal for this scenario.
>>>>>>
>>>>>> On Sat, Sep 28, 2013 at 6:46 PM, adam vogt <vogt.adam at gmail.com>wrote:
>>>>>>
>>>>>>> On Sat, Sep 28, 2013 at 1:09 PM, Ryan Newton <rrnewton at gmail.com>
>>>>>>> wrote:
>>>>>>> > Hi all,
>>>>>>> >
>>>>>>> > We all know and love Data.Foldable and are familiar with left
>>>>>>> folds and
>>>>>>> > right folds. But what you want in a parallel program is a
>>>>>>> balanced fold
>>>>>>> > over a tree. Fortunately, many of our datatypes (Sets, Maps)
>>>>>>> actually ARE
>>>>>>> > balanced trees. Hmm, but how do we expose that?
>>>>>>>
>>>>>>> Hi Ryan,
>>>>>>>
>>>>>>> At least for Data.Map, the Foldable instance seems to have a
>>>>>>> reasonably balanced fold called fold (or foldMap):
>>>>>>>
>>>>>>> > fold t = go t
>>>>>>> > where go (Bin _ _ v l r) = go l `mappend` (v `mappend` go r)
>>>>>>>
>>>>>>> This doesn't seem to be guaranteed though. For example ghc's derived
>>>>>>> instance writes the foldr only, so fold would be right-associated for
>>>>>>> a:
>>>>>>>
>>>>>>> > data T a = B (T a) (T a) | L a deriving (Foldable)
>>>>>>>
>>>>>>> Regards,
>>>>>>> Adam
>>>>>>> _______________________________________________
>>>>>>> Haskell-Cafe mailing list
>>>>>>> Haskell-Cafe at haskell.org
>>>>>>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20131007/918e2c84/attachment-0001.html>
More information about the Libraries
mailing list