Add `take`/`drop`/`splitAt` to `Data.Map`/`Data.Set`

Dan Burton danburton.email at gmail.com
Tue Mar 8 00:26:12 UTC 2016


I would prefer that the Ord constraint be retained in the type signature,
even if not used in the implementation. Sets and Maps conceptually do not
have an ordering; the Ord constraint indicates in which order one is
sequencing the values.

-- Dan Burton

On Mon, Mar 7, 2016 at 4:14 PM, Gabriel Gonzalez <gabriel439 at gmail.com>
wrote:

> I would like to propose adding `take`/`drop`/`splitAt` to both `Data.Map`
> and `Data.Set` as originally requested in:
>
> https://github.com/haskell/containers/issues/135
>
> The motivation behind this proposal is three-fold:
>
> * for convenience - these functions are commonly used to implement
> pagination or previews of maps/sets
> * for type accuracy - the public API impose an unnecessary `Ord` constraint
> * for efficiency - these can be implemented more efficiently using the
> internal API
>
> Currently the only way you can implement this functionality via the public
> API is to use `lookupIndex`/`elemAt` + `split`.  For example, one way to
> implement `Data.Set.take` is:
>
>
> take :: Ord a => Int -> Set a -> Set a
> take n m
>     | n      <  0 = empty
>     | size m <= n = m
>     | otherwise   = lt
>   where
>     (lt, _) = split k m
>     k       = elemAt n m
> {-# INLINE take #-}
>
>
> This implementation incurs an unnecessary `Ord` constraint due to a
> roundabout way of computing `take`: this extracts the element at the given
> index and then works backwards from the element’s value to partition the
> set using O(log N) comparisons.  We could eliminate all of the comparisons
> by using the internal API.
>
> Intuitively, we expect that the performance of `Data.Set.take` would
> benefit from avoiding those unnecessary comparisons and also avoiding
> traversing the `Set`’s spine twice.  So I tested that hypothesis by
> implementing `take` via the internal API like this:
>
> take :: Int -> Set a -> Set a
> take n0 s0 = go s0 n0
>   where
>     go s@(Bin sz x l r) n =
>         if sz <= n
>         then s
>         else
>             let sl = size l
>             in  if n <= sl
>                 then go l n
>                 else link x l (go r $! n - sl)
>     go Tip _ = Tip
> {-# INLINE take #-}
>
>
> I then added the following benchmark to `benchmarks/Set.hs`:
>
> *diff --git a/benchmarks/Set.hs b/benchmarks/Set.hs*
> *index 3a6e8aa..03c99fb 100644*
> *--- a/benchmarks/Set.hs*
> *+++ b/benchmarks/Set.hs*
> @@ -31,6 +31,7 @@ main = do
>          , bench "union" $ whnf (S.union s_even) s_odd
>          , bench "difference" $ whnf (S.difference s) s_even
>          , bench "intersection" $ whnf (S.intersection s) s_even
> +        , bench "take" $ whnf (S.take (2^11)) s
>          , bench "fromList" $ whnf S.fromList elems
>          , bench "fromList-desc" $ whnf S.fromList (reverse elems)
>          , bench "fromAscList" $ whnf S.fromAscList elems
>
>
> Here is the performance on my machine when implementing `take` via the
> public API:
>
> benchmarking take
> time                 272.8 ns   (266.7 ns .. 278.1 ns)
>                      0.997 R²   (0.996 R² .. 0.998 R²)
> mean                 266.3 ns   (261.8 ns .. 270.8 ns)
> std dev              15.44 ns   (13.26 ns .. 18.95 ns)
> variance introduced by outliers: 75% (severely inflated)
>
>
> … and the performance improved by 61% from using the internal API:
>
> benchmarking take
> time                 169.2 ns   (166.1 ns .. 172.6 ns)
>                      0.997 R²   (0.996 R² .. 0.998 R²)
> mean                 172.1 ns   (169.4 ns .. 175.4 ns)
> std dev              10.68 ns   (8.420 ns .. 15.34 ns)
> variance introduced by outliers: 78% (severely inflated)
>
>
> … and I’m guessing (but haven’t tested) that the performance gap would
> only increase the more expensive the comparison function gets.
>
> I haven’t performed comparative performance testing for `drop`/`splitAt`
> nor have I tested `Map` (because the benchmarks take a while for me to
> build and run) but I can perform those additional comparisons upon requests
> if people feel they are necessary.
>
> I haven’t yet written up a full patch since the maintainer asked me to
> first run this proposal by the libraries mailing list to assess whether it
> would be wise to expand the `containers` API to include these utilities.
>
> The deadline for discussion is two weeks.
>
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/libraries/attachments/20160307/f6342d86/attachment-0001.html>


More information about the Libraries mailing list