[Haskell-cafe] Proposal: Partitionable goes somewhere + containers instances
Edward Kmett
ekmett at gmail.com
Sun Sep 29 14:20:04 CEST 2013
I don't know that it belongs in the "standard" libraries, but there could
definitely be a package for something similar.
ConstraintKinds are a pretty hefty extension to throw at it, and the
signature written there prevents it from being used on ByteString, Text,
etc.
This can be implemented with much lighter weight types though!
class Partitionable t where
partition :: Int -> t -> [t]
Now ByteString, Text etc. can be instances and no real flexibility is lost,
as with the class associated constraint on the argument, you'd already
given up polymorphic recursion.
There still remain issues. partition is already established as the
filterthat returns both the matching and unmatching elements, so the
name is
wrong.
This is a generalization of Data.List.splitEvery, perhaps it is worth
seeing how many others can be generalized similarly and talk to Brent about
adding, say, a Data.Split module to his split package in the platform?
-Edward
On Sun, Sep 29, 2013 at 4:21 AM, Ryan Newton <rrnewton at gmail.com> wrote:
> <subject change>
>
> On Sun, Sep 29, 2013 at 3:31 AM, Mike Izbicki <mike at izbicki.me> wrote:
>
>> I've got a Partitionable class that I've been using for this purpose:
>>
>> https://github.com/mikeizbicki/ConstraintKinds/blob/master/src/Control/ConstraintKinds/Partitionable.hs
>>
>
> Mike -- Neat, that's a cool library!
>
> Edward -- ideally, where in the standard libraries should the
> Partitionable comonoid go?
>
> Btw, I'm not sure what the ideal return type for comappend is, given that
> it needs to be able to "bottom out". Mike, our partition function's list
> return type seems more reasonable. Or maybe something simple would be this:
>
> *class Partitionable t where*
> * partition :: t -> Maybe (t,t)*
>
> That is, at some point its not worth splitting and returns Nothing, and
> you'd better be able to deal with the 't' directly.
>
> So what I really want is for the *containers package to please get some
> kind of Partitionable instances! * Johan & others, I would be happy to
> provide a patch if the class can be agreed on. This is important because
> currently the balanced tree structure of Data.Set/Map is an *amazing and
> beneficial property* that is *not* exposed at all through the API.
> For example, it would be great to have a parallel traverse_ for Maps
> and Sets in the Par monad. The particular impetus is that our new and
> enhanced Par monad makes extensive use of Maps and Sets, both the pure,
> balanced ones, and lockfree/inplace ones based on concurrent skip lists:
>
> http://www.cs.indiana.edu/~rrnewton/haddock/lvish/
>
> Alternatively, it would be ok if there were a "Data.Map.Internal" module
> that exposed the Bin/Tip, but I assume people would rather have a clean
> Partitionable instance...
>
> Best,
> -Ryan
>
>
> On Sun, Sep 29, 2013 at 3:31 AM, Mike Izbicki <mike at izbicki.me> wrote:
>
>> I've got a Partitionable class that I've been using for this purpose:
>>
>>
>> https://github.com/mikeizbicki/ConstraintKinds/blob/master/src/Control/ConstraintKinds/Partitionable.hs
>>
>> The function called "parallel" in the HLearn library will automatically
>> parallelize any homomorphism from a Partionable to a Monoid. I
>> specifically use that to parallelize machine learning algorithms.
>>
>> I have two thoughts for better abstractions:
>>
>> 1) This Partitionable class is essentially a comonoid. By reversing the
>> arrows of mappend, we get:
>>
>> comappend :: a -> (a,a)
>>
>> By itself, this works well if the number of processors you have is a
>> power of two, but it needs some more fanciness to get things balanced
>> properly for other numbers of processors. I bet there's another algebraic
>> structure that would capture these other cases, but I'm not sure what it is.
>>
>> 2) I'm working with parallelizing tree structures right now (kd-trees,
>> cover trees, oct-trees, etc.). The real problem is not splitting the
>> number of data points equally (this is easy), but splitting the amount of
>> work equally. Some points take longer to process than others, and this
>> cannot be determined in advance. Therefore, an equal split of the data
>> points can result in one processor getting 25% of the work load, and the
>> second processor getting 75%. Some sort of lazy Partitionable class that
>> was aware of processor loads and didn't split data points until they were
>> needed would be ideal for this scenario.
>>
>> On Sat, Sep 28, 2013 at 6:46 PM, adam vogt <vogt.adam at gmail.com> wrote:
>>
>>> On Sat, Sep 28, 2013 at 1:09 PM, Ryan Newton <rrnewton at gmail.com> wrote:
>>> > Hi all,
>>> >
>>> > We all know and love Data.Foldable and are familiar with left folds and
>>> > right folds. But what you want in a parallel program is a balanced
>>> fold
>>> > over a tree. Fortunately, many of our datatypes (Sets, Maps) actually
>>> ARE
>>> > balanced trees. Hmm, but how do we expose that?
>>>
>>> Hi Ryan,
>>>
>>> At least for Data.Map, the Foldable instance seems to have a
>>> reasonably balanced fold called fold (or foldMap):
>>>
>>> > fold t = go t
>>> > where go (Bin _ _ v l r) = go l `mappend` (v `mappend` go r)
>>>
>>> This doesn't seem to be guaranteed though. For example ghc's derived
>>> instance writes the foldr only, so fold would be right-associated for
>>> a:
>>>
>>> > data T a = B (T a) (T a) | L a deriving (Foldable)
>>>
>>> Regards,
>>> Adam
>>> _______________________________________________
>>> Haskell-Cafe mailing list
>>> Haskell-Cafe at haskell.org
>>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130929/6140c98c/attachment.htm>
More information about the Haskell-Cafe
mailing list