Faster `elem` "Hack" for Data.Set?

David Feuer david.feuer at gmail.com
Sun Jan 30 00:59:51 UTC 2022


The tighter `Foldable` instance probably wouldn't hurt too much in
practice. I have some other concerns, however. One is that
`containers` has a long history of attempting to remain compatible
with potential future Haskell 98-like Haskell implementations (likely
with a few days of work to patch up mistakes when the time comes).
Using a GADT to define Set would break that quite thoroughly. My
second concern is more practical:

  type Set a = SetImpl a a

is all sorts of problematic. It forces anyone who wants to use what's
always been the `Set` type constructor (Set :: Type -> Type) to dig
into the implementation and use `SetImpl :: Type -> Type -> Type`
instead. That's quite a different beast. I don't know all the things
that will break, but they'll surely include
https://hackage.haskell.org/package/constrained-monads-0.5.0.0/docs/Control-Monad-Constrained.html
. I think the real answer is to remove `elem` from `Foldable` and put
it somewhere more appropriate.

On Sat, Jan 29, 2022 at 7:26 PM Viktor Dukhovni <ietf-dane at dukhovni.org> wrote:
>
>
> In a recent "Folding the unfoldable" "gist":
>
>     https://oleg.fi/gists/posts/2022-01-25-folding-unfoldable.html
>
> Oleg describes a way to make unboxed Vectors Foldable:
>
>     There is another way to make Foldable work, with a
>
>         data Hack a b where
>             Hack :: U.Vector a -> Hack a a
>
>     This is a two type-parameter wrapper, but the types are always the
>     same! (I wish that could be a newtype). The Foldable instance is
>     simply:
>
>         instance U.Unbox a => Foldable (Hack a) where
>             foldr f z (Hack v)  = U.foldr f z v
>             foldl' f z (Hack v) = U.foldl' f z v
>
> In the associated Reddit thread
>
>     https://www.reddit.com/r/haskell/comments/sd6gel/comment/hudmsis/?utm_source=share&utm_medium=web2x&context=3
>
> it was observed that a similar approach can be used to give Data.Set a
> performant `elem` method (code copied below my signature).
>
> It is somewhat tempting to consider whether the "Hack" ought to be
> built-in directly into the real Data.Set:
>
>     https://www.reddit.com/r/haskell/comments/sd6gel/comment/huil070/?utm_source=share&utm_medium=web2x&context=3
>
>     type Set a = SetImpl a a
>     data SetImpl a b where
>         Bin :: {-# UNPACK #-} !Size -> !a -> !(SetImpl a) -> !(SetImpl a) -> SetImpl a a
>         Tip :: SetImpl a a
>     type Size     = Int
>
>     instance Ord a => Foldable (SetImpl a) where
>         ...
>         elem _ Tip = False
>         elem x (Bin _ y l r) = case compare x y of
>                EQ -> True
>                LT -> elem x l
>                GT -> elem x r
>         ...
>
> This representation does not seem to carry any obvious runtime overhead,
> and gives Set a performant `elem` method. The only change from the
> status quo would be that folds would not be available for empty or
> singleton sets with non-Ord elements (currently possible with Set).
>
> How much of a loss would it be to constrain the element type of empty
> and singleton sets used in folds?  Are there other issue that make the
> above impractical?
>
> --
>     Viktor.
>
>     type OSet a = OrdSet a a
>     data OrdSet a b where
>         OSet :: Set.Set a -> OrdSet a a
>
>     instance Ord a => Foldable (OrdSet a) where
>         fold (OSet xs) = fold xs
>         foldMap f (OSet xs) = foldMap f xs
>         foldMap' f (OSet xs) = foldMap' f xs
>         foldr f z (OSet xs) = foldr f z xs
>         foldr' f z (OSet xs) = foldr' f z xs
>         foldl f z (OSet xs) = foldl f z xs
>         foldl' f z (OSet xs) = foldl' f z xs
>         foldr1 f (OSet xs) = foldr1 f xs
>         foldl1 f (OSet xs) = foldl1 f xs
>         toList (OSet xs) = toList xs
>         null (OSet xs) = null xs
>         length (OSet xs) = length xs
>         -- The point of the exercise is `elem`
>         elem e (OSet xs) = Set.member e xs
>         maximum (OSet xs) = maximum xs
>         minimum (OSet xs) = minimum xs
>         sum (OSet xs) = sum xs
>         product (OSet xs) = product xs
>
>     instance Eq a => Eq (OSet a) where
>         (OSet xs) == (OSet ys) = xs == ys
>
>     instance (Eq a, Ord a) => Ord (OSet a) where
>         compare (OSet xs) (OSet ys) = compare xs ys
>
>     empty :: Ord a => OSet a
>     empty = OSet Set.empty
>
>     singleton :: Ord a => a -> OSet a
>     singleton = OSet . Set.singleton
>
>     fromList :: Ord a => [a] -> OSet a
>     fromList = OSet . Set.fromList
>
>     (\\) :: Ord a => OSet a -> OSet a -> OSet a
>     (OSet xs) \\ (OSet ys) = OSet (xs Set.\\ ys)
>
>     alterF :: (Ord a, Functor f) => (Bool -> f Bool) -> a -> OSet a -> f (OSet a)
>     alterF f x (OSet xs) = fmap OSet $ Set.alterF f x xs
>
>     ...
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries


More information about the Libraries mailing list