Proposal: add ordNub somewhere in containers

Herbert Valerio Riedel hvriedel at gmail.com
Wed Oct 18 12:35:26 UTC 2017


On 2017-10-16 at 18:17:19 -0400, Gershom B wrote:

[...]

> 1) I propose two new functions,
>
> `ordNub` and `intNub`
>
> with the standard implementation (from https://github.com/nh2/haskell-ordnub):
>
> import qualified Data.Set as Set
>
> ordNub :: (Ord a) => [a] -> [a]
> ordNub l = go Set.empty l
>   where
>     go _ [] = []
>     go s (x:xs) = if x `Set.member` s then go s xs
>                                       else x : go (Set.insert x s) xs
>
> and the same implementation, but specialized to `Int` and using `IntSet`s.

I'm generally +1 for an 'ordNub'; I don't mind the particular color of
the shed.

While at it, I'd like to use the occasion to point out two minor things
with that specific implementation:

 a) One thing I've been missing from `container's API are "UPSERT"-ish
    operations; In the case of `Set`, the implementation above performs
    a lookup, and then has to start the insertion from scratch.

    If we had something like,

       Set.tryInsert :: Ord a => a -> Set a -> (Set a, Bool)

    or

       Set.tryInsert :: Ord a => a -> Set a -> Maybe (Set a)

    `ordNub` would benefit.


 b) This is more of an observation, as there's not much we can do here
    with little effort:

    iirc, `nub` has a worst-case space complexity (if the input list is
    made up of N distinct values, e.g. (`nub [1..n] :: [Int])) of 3N
    words for the [a]-spine to keep track of the already seen values.

    Using `Set` however trades time-complexity for memory-complexity,
    and needs 5N words for the 'Set a'-spine.

    c.f. http://blog.johantibell.com/2011/06/memory-footprints-of-some-common-data.html


-- hvr
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/libraries/attachments/20171018/b20cd08a/attachment.sig>


More information about the Libraries mailing list