Proposal: add ordNub somewhere in containers

Wed Oct 18 22:22:17 UTC 2017

On Wed, Oct 18, 2017 at 10:49 AM, David Feuer <david.feuer at gmail.com> wrote:
> I am convinced that we should add
>
> ordNub :: Ord a => [a] -> [a]
> ordNubOn :: Ord b => (a -> b) -> [a] -> [b]
> intNub :: [Int] -> [Int]
> intNubOn :: (a -> Int) -> [a] -> [a]

+1 to ordNub / ordNubOn, and having the documentation for nub mention
that you probably want ordNub in nearly all cases.

+1 to intNub / intNubOn, with RULES pragma to specialize ordNub /
ordNubOn to it.

> And because nub preserves non-emptiness, I believe we should also offer
>
> ordNub1 :: Ord a => NonEmpty a -> NonEmpty a
> ordNubOn1 :: Ord b => (a -> b) -> NonEmpty a -> NonEmpty a
> intNub1 :: NonEmpty Int -> NonEmpty Int
> intNubOn1 :: (a -> Int) -> NonEmpty a -> NonEmpty a

I wish these could go in base.  It would be export these from
Data.List.NonEmpty, using the same names as the list operations, but
this is not possible due to the implementation in terms of Set.

> I imagine we should also add these operations for Data.Sequence.Seq.

+1

> I'm not yet convinced that we should add
>
> ordNubBy :: (a -> a -> Ordering) -> [a] -> [a]

Such an operation is quite useful, though unfortunately it does not
fit naturally into an implementation in terms of Set.  There are quite
reasonable use-cases here, where even a hack with a new datatype could
be quite convoluted.    I recently used a "sortNubBy" in a client
project.  Perhaps that should be another proposal - it is something
that could be in base, though less efficient.  In particular, this
function is useful in cases where the ordering depends on some value
other than the value of the element.  Another, more common case, is
ignoring some distinctions between values, such as ignoring a field.
Sure, you can make a newtype and use "coerce" for this, but I don't
like using "coerce" because in general it can break invariants.  Much
better than unsafeCoerce, of course, but still unsafe.  I'd rather not
create an Ord instance just to use it in one place, it seems like
unnecessary plumbing.

> but I'm open to further discussion of that question. My main concern is that
> the properties of the comparison argument require careful documentation. In
> its favor, using it improperly cannot *expose* a broken Set to later
> operations.
>
> I would very much like to hear further bikeshedding around names and
> namespaces.
>
> On Oct 16, 2017 6:18 PM, "Gershom B" <gershomb at gmail.com> wrote:
>>
>> There have been many discussions over the years about adding an
>> efficient order preserving nub somewhere to our core libraries. It
>> always comes down to the same issue: an efficient nub wants to be
>> backed by an efficient `Set`, but the API of the `nub` itself doesn't
>> make reference to any other data structures besides lists. So it feels
>> a bit conceptually strange to put an efficient nub anywhere besides
>> `Data.List` even though it can't go there without inverting our
>> dependency tree in a weird way or inlining an efficient set
>> implementation into the middle of it.
>>
>> Nonetheless, the convenience of having a good `nub` lying around in a
>> core library is undeniable, and after writing the "usual" one in my
>> code for the zillionth time, I decided to raise an issue about it:
>>
>> https://github.com/haskell/containers/issues/439
>>
>> I was promptly directed here to make a proper proposal.
>>
>> So, here:
>>
>> 1) I propose two new functions,
>>
>> `ordNub` and `intNub`
>>
>> with the standard implementation (from
>> https://github.com/nh2/haskell-ordnub):
>>
>> import qualified Data.Set as Set
>>
>> ordNub :: (Ord a) => [a] -> [a]
>> ordNub l = go Set.empty l
>>   where
>>     go _ [] = []
>>     go s (x:xs) = if x `Set.member` s then go s xs
>>                                       else x : go (Set.insert x s) xs
>>
>> and the same implementation, but specialized to `Int` and using `IntSet`s.
>>
>> The rationale for the names is that the former has a long history of
>> use in folklore, and the latter is the obvious specialization of it.
>>
>> 2) I propose these functions be added to a new module in the
>> `containers` library: `Data.Containers.ListUtils`. This can also
>> potentially in the future add efficient list intersection, etc. as
>> documented on the above reference link.
>>
>> The rationale for the new module is that it can provide a meaningful
>> home for such functions which operate on lists, but require other data
>> structures to be implemented efficiently...
>>
>> Discussion period: 2 weeks.
>>
>> --Gershom
>> _______________________________________________
>> Libraries mailing list
>> Libraries at haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
>
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>