Faster `elem` "Hack" for Data.Set?

Viktor Dukhovni ietf-dane at dukhovni.org
Sun Jan 30 00:25:16 UTC 2022


In a recent "Folding the unfoldable" "gist":

    https://oleg.fi/gists/posts/2022-01-25-folding-unfoldable.html

Oleg describes a way to make unboxed Vectors Foldable:

    There is another way to make Foldable work, with a

        data Hack a b where
            Hack :: U.Vector a -> Hack a a

    This is a two type-parameter wrapper, but the types are always the
    same! (I wish that could be a newtype). The Foldable instance is
    simply:

        instance U.Unbox a => Foldable (Hack a) where
            foldr f z (Hack v)  = U.foldr f z v
            foldl' f z (Hack v) = U.foldl' f z v

In the associated Reddit thread

    https://www.reddit.com/r/haskell/comments/sd6gel/comment/hudmsis/?utm_source=share&utm_medium=web2x&context=3

it was observed that a similar approach can be used to give Data.Set a
performant `elem` method (code copied below my signature).

It is somewhat tempting to consider whether the "Hack" ought to be
built-in directly into the real Data.Set:

    https://www.reddit.com/r/haskell/comments/sd6gel/comment/huil070/?utm_source=share&utm_medium=web2x&context=3
   
    type Set a = SetImpl a a
    data SetImpl a b where
        Bin :: {-# UNPACK #-} !Size -> !a -> !(SetImpl a) -> !(SetImpl a) -> SetImpl a a
        Tip :: SetImpl a a
    type Size     = Int

    instance Ord a => Foldable (SetImpl a) where
        ...
        elem _ Tip = False
        elem x (Bin _ y l r) = case compare x y of
               EQ -> True
               LT -> elem x l
               GT -> elem x r
        ... 

This representation does not seem to carry any obvious runtime overhead,
and gives Set a performant `elem` method. The only change from the
status quo would be that folds would not be available for empty or
singleton sets with non-Ord elements (currently possible with Set).

How much of a loss would it be to constrain the element type of empty
and singleton sets used in folds?  Are there other issue that make the
above impractical?

-- 
    Viktor.

    type OSet a = OrdSet a a
    data OrdSet a b where
        OSet :: Set.Set a -> OrdSet a a

    instance Ord a => Foldable (OrdSet a) where
        fold (OSet xs) = fold xs
        foldMap f (OSet xs) = foldMap f xs
        foldMap' f (OSet xs) = foldMap' f xs
        foldr f z (OSet xs) = foldr f z xs
        foldr' f z (OSet xs) = foldr' f z xs
        foldl f z (OSet xs) = foldl f z xs
        foldl' f z (OSet xs) = foldl' f z xs
        foldr1 f (OSet xs) = foldr1 f xs                                                                                                                   
        foldl1 f (OSet xs) = foldl1 f xs
        toList (OSet xs) = toList xs
        null (OSet xs) = null xs
        length (OSet xs) = length xs
        -- The point of the exercise is `elem`
        elem e (OSet xs) = Set.member e xs
        maximum (OSet xs) = maximum xs
        minimum (OSet xs) = minimum xs
        sum (OSet xs) = sum xs
        product (OSet xs) = product xs

    instance Eq a => Eq (OSet a) where
        (OSet xs) == (OSet ys) = xs == ys

    instance (Eq a, Ord a) => Ord (OSet a) where
        compare (OSet xs) (OSet ys) = compare xs ys

    empty :: Ord a => OSet a
    empty = OSet Set.empty

    singleton :: Ord a => a -> OSet a
    singleton = OSet . Set.singleton

    fromList :: Ord a => [a] -> OSet a
    fromList = OSet . Set.fromList
     
    (\\) :: Ord a => OSet a -> OSet a -> OSet a
    (OSet xs) \\ (OSet ys) = OSet (xs Set.\\ ys)

    alterF :: (Ord a, Functor f) => (Bool -> f Bool) -> a -> OSet a -> f (OSet a)
    alterF f x (OSet xs) = fmap OSet $ Set.alterF f x xs

    ...


More information about the Libraries mailing list