Proposal #2717: Add nubWith, nubOrd

Mitchell, Neil neil.mitchell.2 at credit-suisse.com
Wed Oct 22 03:59:12 EDT 2008


Hi

> Were we to keep it, do you have a better naming suggestion?

[Warning: untested code]

filterAccum :: (acc -> x -> (acc,Bool)) -> acc -> [x] -> (acc, [x])
filterAccum f a [] = (a, [])
filterAccum f a (x:xs) = (an, [x|b]++rest)
   where (a2,b) = f a x
         (an,rest) = filterAccum f a2 xs

This follows the type of mapAccumL, and is more general than your
function. You could change the utility function to be (acc -> x ->
(acc,Maybe y)) to get a variant that is more general than both mapAccum
and filterAccum.


> could work?  [The problem starts with the belief of everyone 
> I've ever talked to about it that "nub" itself should have 
> been called "uniq". :-)]

Your function has nothing to do with uniqueness or nubness. It is
filtering with a state.

> Of course the technical reason the user needs to pass an 
> empty stop list is that we killed the type class that goes 
> with nubWith, so there's no way to know what the empty 
> accumulator of their stop list type actually is.

This all seems like complexity that shouldn't be there. The base library
should provide simple things (folds, maps, filters) and simple concepts
with very slightly more involved implementations (sort, reverse, nub).
Anything that isn't a simple concept should go in a separate library.

> general, the stop list is an accumulator of values that you 
> want to stop after the first one.

That's how you use it from nubOrd, not anything to do with the function.

> > I'm not even convinced that nubWith really is a nub 
> function, and not 
> > just some generalised filter - filterState perhaps. In which case 
> > you'd want to generalise Maybe b to (Bool,b).
> 
> Or to Either b b?  Which is preferred in this case?

Either b b is a horrible type, its semantically equivalent to (Bool,b)
but with added confusion (in most cases, there are some exceptions). See
my above filterState for a more general version.


> > Can you ever imagine anyone other than nubOrd using nubWith?
> 
> Yes. As discussed previously you can get a significant 
> constant factor speedup with nubInt on IntSets if you need 
> it.  Also, as discussed previously, you can get a speedup for 
> nubChar by using a mutable array as the stop list.

Sounds like great reasons for adding nubInt (or just nubOrd in the
IntMap module). Perhaps there should also be a CharMap module which
provides similar values for Char. I can't imagine your nubChar with a
mutable array handles all Unicode points?

These ideas are starting to be more radical, and sound like the thing a
type class or type families would be suited for. Perhaps prototype these
ideas in a "fastnub" library on Hackage?

> One can also imagine using this in less traditional ways; for example
> 
>   nondescendingSubsequence [] = []
>   nondescendingSubsequence (e : es) =
>     e : nubWith (\a b-> if (a >= b) then Just a else Nothing) e es
> 
>   > nondescendingSubsequence [1,2,3,2,3,3,4,1,1]
>   [1,2,3,3,3,4]
> 
> > Is it a genuine utility function people have been crying out for?
> 
> IMHO it is a "genuine utility function", whatever that means.
> It certainly isn't something "people have been crying out 
> for", so if that's the criterion we should omit it.

I think for the base library that "crying out for" is the minimum
criteria. We also need it to have an obvious name, a well explored
design space (or an obviously trivial design space), and the desire to
support it for all eternity.

> It just seems churlish to hide it, given that it's sitting in 
> there being a perfectly useful building block for things 
> people do cry out for.  I'd guess it would be used about as 
> often as "break", which is in there for much the same reason.

Hoogle uses break 33 times. It used to use it a lot more, but then I
moved to using Parsec. I've never had a desire to use filterState or
nubWith. I'm only one person, so this may not be typical.

> 
> > It seems perfectly good to include as a local function in 
> Data.Set to 
> > be used to implement nubOrd, but I don't see its value as a 
> standalone 
> > export from a very commonly used library full of very useful stuff.
> 
> We could move it to live in Data.Set with nubOrd if folks 
> think that it doesn't belong in Data.List.  Or we could move 
> them both into their own module, although that seems silly to 
> me given that I can't really think what else goes in there.

It's too little for its own module - if you are going to do that just
make it a separate package. I don't want nubWith moved to Data.Set, I
want it hidden. Once its hidden, then it can go wherever (although
because of package scoping rules etc inside Data.Set is the best place)

> Please understand that I'm in no way wedded to the idea of 
> getting any of this in; I want to do what works best for 
> everyone, and am grateful for your and everyone's help in 
> figuring out what that is.

I realise. I really am very enthusiastic about getting nubOrd in (I've
even proposed it previously myself), and appreciate the effort you've
put into this.

Thanks

Neil

==============================================================================
Please access the attached hyperlink for an important electronic communications disclaimer: 

http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
==============================================================================



More information about the Libraries mailing list