Add `take`/`drop`/`splitAt` to `Data.Map`/`Data.Set`

Tue Mar 8 21:18:50 UTC 2016

Agreed. I was just playing devil's advocate.
On Mar 8, 2016 4:14 PM, "Edward Kmett" <ekmett at gmail.com> wrote:

> +1 on adding the methods, but I'd really rather see it done without
> incurring spurious constraints that they don't need.
>
> We just went through and cleaned up a few similar unused and unusable
> constraints in base on various array operations. This seems to beg us to do
> the same later, and we don't bother to wastefully pass in Ord constraints
> on any other combinators in Data.Set or Data.Map, so why start now?
>
> -Edward
>
>
>
> On Mon, Mar 7, 2016 at 7:14 PM, Gabriel Gonzalez <gabriel439 at gmail.com>
> wrote:
>
>> I would like to propose adding `take`/`drop`/`splitAt` to both `Data.Map`
>> and `Data.Set` as originally requested in:
>>
>> https://github.com/haskell/containers/issues/135
>>
>> The motivation behind this proposal is three-fold:
>>
>> * for convenience - these functions are commonly used to implement
>> pagination or previews of maps/sets
>> * for type accuracy - the public API impose an unnecessary `Ord`
>> constraint
>> * for efficiency - these can be implemented more efficiently using the
>> internal API
>>
>> Currently the only way you can implement this functionality via the
>> public API is to use `lookupIndex`/`elemAt` + `split`.  For example, one
>> way to implement `Data.Set.take` is:
>>
>>
>> take :: Ord a => Int -> Set a -> Set a
>> take n m
>>     | n      <  0 = empty
>>     | size m <= n = m
>>     | otherwise   = lt
>>   where
>>     (lt, _) = split k m
>>     k       = elemAt n m
>> {-# INLINE take #-}
>>
>>
>> This implementation incurs an unnecessary `Ord` constraint due to a
>> roundabout way of computing `take`: this extracts the element at the given
>> index and then works backwards from the element’s value to partition the
>> set using O(log N) comparisons.  We could eliminate all of the comparisons
>> by using the internal API.
>>
>> Intuitively, we expect that the performance of `Data.Set.take` would
>> benefit from avoiding those unnecessary comparisons and also avoiding
>> traversing the `Set`’s spine twice.  So I tested that hypothesis by
>> implementing `take` via the internal API like this:
>>
>> take :: Int -> Set a -> Set a
>> take n0 s0 = go s0 n0
>>   where
>>     go s@(Bin sz x l r) n =
>>         if sz <= n
>>         then s
>>         else
>>             let sl = size l
>>             in  if n <= sl
>>                 then go l n
>>                 else link x l (go r $! n - sl)
>>     go Tip _ = Tip
>> {-# INLINE take #-}
>>
>>
>> I then added the following benchmark to `benchmarks/Set.hs`:
>>
>> *diff --git a/benchmarks/Set.hs b/benchmarks/Set.hs*
>> *index 3a6e8aa..03c99fb 100644*
>> *--- a/benchmarks/Set.hs*
>> *+++ b/benchmarks/Set.hs*
>> @@ -31,6 +31,7 @@ main = do
>>          , bench "union" $ whnf (S.union s_even) s_odd
>>          , bench "difference" $ whnf (S.difference s) s_even
>>          , bench "intersection" $ whnf (S.intersection s) s_even
>> +        , bench "take" $ whnf (S.take (2^11)) s
>>          , bench "fromList" $ whnf S.fromList elems
>>          , bench "fromList-desc" $ whnf S.fromList (reverse elems)
>>          , bench "fromAscList" $ whnf S.fromAscList elems
>>
>>
>> Here is the performance on my machine when implementing `take` via the
>> public API:
>>
>> benchmarking take
>> time                 272.8 ns   (266.7 ns .. 278.1 ns)
>>                      0.997 R²   (0.996 R² .. 0.998 R²)
>> mean                 266.3 ns   (261.8 ns .. 270.8 ns)
>> std dev              15.44 ns   (13.26 ns .. 18.95 ns)
>> variance introduced by outliers: 75% (severely inflated)
>>
>>
>> … and the performance improved by 61% from using the internal API:
>>
>> benchmarking take
>> time                 169.2 ns   (166.1 ns .. 172.6 ns)
>>                      0.997 R²   (0.996 R² .. 0.998 R²)
>> mean                 172.1 ns   (169.4 ns .. 175.4 ns)
>> std dev              10.68 ns   (8.420 ns .. 15.34 ns)
>> variance introduced by outliers: 78% (severely inflated)
>>
>>
>> … and I’m guessing (but haven’t tested) that the performance gap would
>> only increase the more expensive the comparison function gets.
>>
>> I haven’t performed comparative performance testing for `drop`/`splitAt`
>> nor have I tested `Map` (because the benchmarks take a while for me to
>> build and run) but I can perform those additional comparisons upon requests
>> if people feel they are necessary.
>>
>> I haven’t yet written up a full patch since the maintainer asked me to
>> first run this proposal by the libraries mailing list to assess whether it
>> would be wise to expand the `containers` API to include these utilities.
>>
>> The deadline for discussion is two weeks.
>>
>> _______________________________________________
>> Libraries mailing list
>> Libraries at haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>
>>
>
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/libraries/attachments/20160308/fd8abafc/attachment.html>