[trac@galois.com: Re: [GHC] #1218: Add sortNub and sortNubBy to Data.List]

John Meacham john at repetae.net
Wed Mar 21 17:06:39 EDT 2007


On Mon, Mar 19, 2007 at 11:22:47AM +1100, Duncan Coutts wrote:
> On Mon, 2007-03-19 at 11:02 +1100, Donald Bruce Stewart wrote:
> > I propose we *do not* change the api to add the special case:
> > 
> >     sortNub = sort . nub
> >             = map head . group . sort
> > 
> > and instead add a rewrite rule to Data.List to provide this
> > optimisation.
> > 
> >  {-# RULES
> >  "sort/nub" sort . nub = map head . group . sort
> >    #-}
> 
> Though of course we'd want to use the more efficient implementation than
> that. One can write a sort that directly eliminates duplicates, and I
> think you'd want a rule for nub . sort as well as sort . nub.
> 
> And while we're at it we can add a rule to rewrite all the places people
> have used map head . group . sort (or the goupBy f . sortBy f
> equivalents) to this equals-eliminating sort implementation.

note, this is most likely not what you want at all:

> >  {-# RULES
> >  "sort/nub" sort . nub = map head . group . sort
> >    #-}            

desugars to 
> >  {-# RULES
> >  "sort/nub" (.) sort nub = map head . group . sort
> >    #-}            

meaning you are actually adding a RULE to (.), not sort or nub. this
hasa couple of unfortunate consequences.

* people will have to use 'sort . nub' rather than something like sort $ nub xs
* (.) will no longer be inlined until much later in the optimization
 passes, which perhaps could have disasterous consequences

that said, I do not believe this is an appropriate use of rules to begin
with, it changes the asymptotic complexity, so should be available as an
explicit option, and is just a generally useful routine. not that rules
can't also exist, but attached to the right thing.

(sort (nub xs)) = nubSort xs
(nub (sort xs)) = nubSort xs


now, I would also like to see (and it might turn out to be more useful),

fastNub, which is a nub that uses Ord and sets to be just as fast, but
works on infinite lists. accidentally evaluating a whole list in memory
is the quickest way to a space leak and 'sort' is the most common
culprit. this is why 'nub' can often be _faster_ than sortNub when you
only partially evaluate a list, with ordNub, you get the benefits of
both!

        John 



-- 
John Meacham - ⑆repetae.net⑆john⑈


More information about the Libraries mailing list