Add `take`/`drop`/`splitAt` to `Data.Map`/`Data.Set`
Roman Cheplyaka
roma at ro-che.info
Tue Mar 8 18:54:45 UTC 2016
Good idea, +1.
On 03/08/2016 02:14 AM, Gabriel Gonzalez wrote:
> I would like to propose adding `take`/`drop`/`splitAt` to both
> `Data.Map` and `Data.Set` as originally requested in:
>
> https://github.com/haskell/containers/issues/135
>
> The motivation behind this proposal is three-fold:
>
> * for convenience - these functions are commonly used to implement
> pagination or previews of maps/sets
> * for type accuracy - the public API impose an unnecessary `Ord` constraint
> * for efficiency - these can be implemented more efficiently using the
> internal API
>
> Currently the only way you can implement this functionality via the
> public API is to use `lookupIndex`/`elemAt` + `split`. For example, one
> way to implement `Data.Set.take` is:
>
>
> take :: Ord a => Int -> Set a -> Set a
> take n m
> | n < 0 = empty
> | size m <= n = m
> | otherwise = lt
> where
> (lt, _) = split k m
> k = elemAt n m
> {-# INLINE take #-}
>
>
> This implementation incurs an unnecessary `Ord` constraint due to a
> roundabout way of computing `take`: this extracts the element at the
> given index and then works backwards from the element’s value to
> partition the set using O(log N) comparisons. We could eliminate all of
> the comparisons by using the internal API.
>
> Intuitively, we expect that the performance of `Data.Set.take` would
> benefit from avoiding those unnecessary comparisons and also avoiding
> traversing the `Set`’s spine twice. So I tested that hypothesis by
> implementing `take` via the internal API like this:
>
> take :: Int -> Set a -> Set a
> take n0 s0 = go s0 n0
> where
> go s@(Bin sz x l r) n =
> if sz <= n
> then s
> else
> let sl = size l
> in if n <= sl
> then go l n
> else link x l (go r $! n - sl)
> go Tip _ = Tip
> {-# INLINE take #-}
>
>
> I then added the following benchmark to `benchmarks/Set.hs`:
>
> *diff --git a/benchmarks/Set.hs b/benchmarks/Set.hs*
> *index 3a6e8aa..03c99fb 100644*
> *--- a/benchmarks/Set.hs*
> *+++ b/benchmarks/Set.hs*
> @@ -31,6 +31,7 @@main = do
> , bench "union" $ whnf (S.union s_even) s_odd
> , bench "difference" $ whnf (S.difference s) s_even
> , bench "intersection" $ whnf (S.intersection s) s_even
> + , bench "take" $ whnf (S.take (2^11)) s
> , bench "fromList" $ whnf S.fromList elems
> , bench "fromList-desc" $ whnf S.fromList (reverse elems)
> , bench "fromAscList" $ whnf S.fromAscList elems
>
>
> Here is the performance on my machine when implementing `take` via the
> public API:
>
> benchmarking take
> time 272.8 ns (266.7 ns .. 278.1 ns)
> 0.997 R² (0.996 R² .. 0.998 R²)
> mean 266.3 ns (261.8 ns .. 270.8 ns)
> std dev 15.44 ns (13.26 ns .. 18.95 ns)
> variance introduced by outliers: 75% (severely inflated)
>
>
> … and the performance improved by 61% from using the internal API:
>
> benchmarking take
> time 169.2 ns (166.1 ns .. 172.6 ns)
> 0.997 R² (0.996 R² .. 0.998 R²)
> mean 172.1 ns (169.4 ns .. 175.4 ns)
> std dev 10.68 ns (8.420 ns .. 15.34 ns)
> variance introduced by outliers: 78% (severely inflated)
>
>
> … and I’m guessing (but haven’t tested) that the performance gap would
> only increase the more expensive the comparison function gets.
>
> I haven’t performed comparative performance testing for `drop`/`splitAt`
> nor have I tested `Map` (because the benchmarks take a while for me to
> build and run) but I can perform those additional comparisons upon
> requests if people feel they are necessary.
>
> I haven’t yet written up a full patch since the maintainer asked me to
> first run this proposal by the libraries mailing list to assess whether
> it would be wise to expand the `containers` API to include these utilities.
>
> The deadline for discussion is two weeks.
>
>
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
More information about the Libraries
mailing list