Add `take`/`drop`/`splitAt` to `Data.Map`/`Data.Set`
David Feuer
david.feuer at gmail.com
Tue Mar 8 21:18:50 UTC 2016
Agreed. I was just playing devil's advocate.
On Mar 8, 2016 4:14 PM, "Edward Kmett" <ekmett at gmail.com> wrote:
> +1 on adding the methods, but I'd really rather see it done without
> incurring spurious constraints that they don't need.
>
> We just went through and cleaned up a few similar unused and unusable
> constraints in base on various array operations. This seems to beg us to do
> the same later, and we don't bother to wastefully pass in Ord constraints
> on any other combinators in Data.Set or Data.Map, so why start now?
>
> -Edward
>
>
>
> On Mon, Mar 7, 2016 at 7:14 PM, Gabriel Gonzalez <gabriel439 at gmail.com>
> wrote:
>
>> I would like to propose adding `take`/`drop`/`splitAt` to both `Data.Map`
>> and `Data.Set` as originally requested in:
>>
>> https://github.com/haskell/containers/issues/135
>>
>> The motivation behind this proposal is three-fold:
>>
>> * for convenience - these functions are commonly used to implement
>> pagination or previews of maps/sets
>> * for type accuracy - the public API impose an unnecessary `Ord`
>> constraint
>> * for efficiency - these can be implemented more efficiently using the
>> internal API
>>
>> Currently the only way you can implement this functionality via the
>> public API is to use `lookupIndex`/`elemAt` + `split`. For example, one
>> way to implement `Data.Set.take` is:
>>
>>
>> take :: Ord a => Int -> Set a -> Set a
>> take n m
>> | n < 0 = empty
>> | size m <= n = m
>> | otherwise = lt
>> where
>> (lt, _) = split k m
>> k = elemAt n m
>> {-# INLINE take #-}
>>
>>
>> This implementation incurs an unnecessary `Ord` constraint due to a
>> roundabout way of computing `take`: this extracts the element at the given
>> index and then works backwards from the element’s value to partition the
>> set using O(log N) comparisons. We could eliminate all of the comparisons
>> by using the internal API.
>>
>> Intuitively, we expect that the performance of `Data.Set.take` would
>> benefit from avoiding those unnecessary comparisons and also avoiding
>> traversing the `Set`’s spine twice. So I tested that hypothesis by
>> implementing `take` via the internal API like this:
>>
>> take :: Int -> Set a -> Set a
>> take n0 s0 = go s0 n0
>> where
>> go s@(Bin sz x l r) n =
>> if sz <= n
>> then s
>> else
>> let sl = size l
>> in if n <= sl
>> then go l n
>> else link x l (go r $! n - sl)
>> go Tip _ = Tip
>> {-# INLINE take #-}
>>
>>
>> I then added the following benchmark to `benchmarks/Set.hs`:
>>
>> *diff --git a/benchmarks/Set.hs b/benchmarks/Set.hs*
>> *index 3a6e8aa..03c99fb 100644*
>> *--- a/benchmarks/Set.hs*
>> *+++ b/benchmarks/Set.hs*
>> @@ -31,6 +31,7 @@ main = do
>> , bench "union" $ whnf (S.union s_even) s_odd
>> , bench "difference" $ whnf (S.difference s) s_even
>> , bench "intersection" $ whnf (S.intersection s) s_even
>> + , bench "take" $ whnf (S.take (2^11)) s
>> , bench "fromList" $ whnf S.fromList elems
>> , bench "fromList-desc" $ whnf S.fromList (reverse elems)
>> , bench "fromAscList" $ whnf S.fromAscList elems
>>
>>
>> Here is the performance on my machine when implementing `take` via the
>> public API:
>>
>> benchmarking take
>> time 272.8 ns (266.7 ns .. 278.1 ns)
>> 0.997 R² (0.996 R² .. 0.998 R²)
>> mean 266.3 ns (261.8 ns .. 270.8 ns)
>> std dev 15.44 ns (13.26 ns .. 18.95 ns)
>> variance introduced by outliers: 75% (severely inflated)
>>
>>
>> … and the performance improved by 61% from using the internal API:
>>
>> benchmarking take
>> time 169.2 ns (166.1 ns .. 172.6 ns)
>> 0.997 R² (0.996 R² .. 0.998 R²)
>> mean 172.1 ns (169.4 ns .. 175.4 ns)
>> std dev 10.68 ns (8.420 ns .. 15.34 ns)
>> variance introduced by outliers: 78% (severely inflated)
>>
>>
>> … and I’m guessing (but haven’t tested) that the performance gap would
>> only increase the more expensive the comparison function gets.
>>
>> I haven’t performed comparative performance testing for `drop`/`splitAt`
>> nor have I tested `Map` (because the benchmarks take a while for me to
>> build and run) but I can perform those additional comparisons upon requests
>> if people feel they are necessary.
>>
>> I haven’t yet written up a full patch since the maintainer asked me to
>> first run this proposal by the libraries mailing list to assess whether it
>> would be wise to expand the `containers` API to include these utilities.
>>
>> The deadline for discussion is two weeks.
>>
>> _______________________________________________
>> Libraries mailing list
>> Libraries at haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>
>>
>
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/libraries/attachments/20160308/fd8abafc/attachment.html>
More information about the Libraries
mailing list