Proposal: add ordNub somewhere in containers

Gershom B gershomb at gmail.com
Wed Oct 18 21:28:37 UTC 2017


I agree that `ordNubOn` suffices and we don't need `ordNubBy` since
there's nothing lawful you can do with the latter that you can't do
with the former. I'm indifferent on the NonEmpty and Seq case as I
don't suspect that they will yield much more efficient implementations
than going via lists, especially if we setup (and we should!) the
fusion rules correctly. I have no objection to adding them for
completeness however.

If we do add them, then the proposed module name
`Data.Containers.ListUtils` becomes slightly less appropriate, but I
think still fine, since these are "morally" all lists of various
sorts.

-g


On Wed, Oct 18, 2017 at 1:49 PM, David Feuer <david.feuer at gmail.com> wrote:
> I am convinced that we should add
>
> ordNub :: Ord a => [a] -> [a]
> ordNubOn :: Ord b => (a -> b) -> [a] -> [b]
> intNub :: [Int] -> [Int]
> intNubOn :: (a -> Int) -> [a] -> [a]
>
> And because nub preserves non-emptiness, I believe we should also offer
>
> ordNub1 :: Ord a => NonEmpty a -> NonEmpty a
> ordNubOn1 :: Ord b => (a -> b) -> NonEmpty a -> NonEmpty a
> intNub1 :: NonEmpty Int -> NonEmpty Int
> intNubOn1 :: (a -> Int) -> NonEmpty a -> NonEmpty a
>
> I imagine we should also add these operations for Data.Sequence.Seq.
>
> I'm not yet convinced that we should add
>
> ordNubBy :: (a -> a -> Ordering) -> [a] -> [a]
>
> but I'm open to further discussion of that question. My main concern is that
> the properties of the comparison argument require careful documentation. In
> its favor, using it improperly cannot *expose* a broken Set to later
> operations.
>
> I would very much like to hear further bikeshedding around names and
> namespaces.
>
> On Oct 16, 2017 6:18 PM, "Gershom B" <gershomb at gmail.com> wrote:
>>
>> There have been many discussions over the years about adding an
>> efficient order preserving nub somewhere to our core libraries. It
>> always comes down to the same issue: an efficient nub wants to be
>> backed by an efficient `Set`, but the API of the `nub` itself doesn't
>> make reference to any other data structures besides lists. So it feels
>> a bit conceptually strange to put an efficient nub anywhere besides
>> `Data.List` even though it can't go there without inverting our
>> dependency tree in a weird way or inlining an efficient set
>> implementation into the middle of it.
>>
>> Nonetheless, the convenience of having a good `nub` lying around in a
>> core library is undeniable, and after writing the "usual" one in my
>> code for the zillionth time, I decided to raise an issue about it:
>>
>> https://github.com/haskell/containers/issues/439
>>
>> I was promptly directed here to make a proper proposal.
>>
>> So, here:
>>
>> 1) I propose two new functions,
>>
>> `ordNub` and `intNub`
>>
>> with the standard implementation (from
>> https://github.com/nh2/haskell-ordnub):
>>
>> import qualified Data.Set as Set
>>
>> ordNub :: (Ord a) => [a] -> [a]
>> ordNub l = go Set.empty l
>>   where
>>     go _ [] = []
>>     go s (x:xs) = if x `Set.member` s then go s xs
>>                                       else x : go (Set.insert x s) xs
>>
>> and the same implementation, but specialized to `Int` and using `IntSet`s.
>>
>> The rationale for the names is that the former has a long history of
>> use in folklore, and the latter is the obvious specialization of it.
>>
>> 2) I propose these functions be added to a new module in the
>> `containers` library: `Data.Containers.ListUtils`. This can also
>> potentially in the future add efficient list intersection, etc. as
>> documented on the above reference link.
>>
>> The rationale for the new module is that it can provide a meaningful
>> home for such functions which operate on lists, but require other data
>> structures to be implemented efficiently...
>>
>> Discussion period: 2 weeks.
>>
>> --Gershom
>> _______________________________________________
>> Libraries mailing list
>> Libraries at haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries


More information about the Libraries mailing list