Standard Library report: List union

Simon Peyton-Jones simonpj@microsoft.com
Mon, 11 Mar 2002 05:04:28 -0800

```There's a remark at the beginning of 7.2 that says:

delete, (\\), union and intersect preserve the invariant=20
that lists don't contain duplicates, provided that=20
their first argument contains no duplicates.

The same applies to unionBy etc.   This design is one
you might reasonably disagree with.  I'd have thought
it would be more sensible to have the invariant that
*both* arguments to union and intersect are assumed
to be sets (no dups).  But I don't propose to change this now.

Simon

| -----Original Message-----
| From: Jay Cox [mailto:sqrtofone@yahoo.com]=20
| Sent: 05 March 2002 01:51
| To: Jon Fairbairn
| Subject: Re: Standard Library report: List union
|=20
|=20
| On Mon, 4 Mar 2002, Jon Fairbairn wrote:
|=20
| > The current library report defines unionBy like this:
| >
| >   unionBy eq xs ys =3D  xs ++ deleteFirstsBy eq (nubBy eq ys) xs
| >
| > why does it take the nub of ys, but not xs?  I'd have expected
| >
| >   unionBy eq xs ys =3D  (nubBy eq xs) ++ deleteFirstsBy eq=20
| (nubBy eq ys)=20
| > xs
| >
| >   J=F3n
|=20
| Pure guess, but... (no really!)
|=20
|=20
| For the sake of argument, lets define a ulist as a list where=20
| for all elements x in list l, there is no element n with=20
| index not equal to that of x (index being position of the=20
| element in the list) such that eq n x =3D=3D True.
|=20
| In other words every element in a ulist appears only once.
|=20
| Suppose you (unionBy eq x y) to get a result.
| Suppose also that x is a ulist
| A. x is a ulist by argument.
| B. the result of (nubBy eq ys), lets call it z, is a ulist.
| C. the result of (deleteFirstsBye eq z xs) is a list which=20
| has no elements in common with xs).  because (deleteFirstsBy=20
| eq) "deletes" elements and doesnt add,the result is a ulist.=20
| D. Adding new, unique, elements (elements not equal to a=20
| element in the ulist in question) to a ulist results in a=20
| ulist. E. Therefore (unionBy eq x y) is a ulist.
|=20
|=20
| Why should this be important?
|=20
| what if you wanted to fold the function (unionBy eq) over a=20
| list of lists to get a ulist?  Assuming you start with an=20
| initial ulist, by your suggestion you'd be applying (nubBy=20
| eq) to a ulist (generated by the the repeated application=20
| (unionByeq), which would be the same as applying the identity=20
| function to a ulist. (meaning you have essentially one big=20
| nasty no-op)!
|=20
|=20
| However, in taking a look at unionBy, one might wonder why it=20
| isnt defined like so (assuming xs would be the accumulated=20
| ulist in the fold.
|=20
|=20
| unionBy eq xs ys =3D (deleteFirstsBy eq (nubBy eq ys) xs) ++ xs
|=20
| or maybe better (to mirror (++))
|=20
| unionBy' eq ys xs =3D (deleteFirstsBy eq (nubBy eq ys) xs) ++ xs
|=20
| in using this definition, the number of conses with (:)=20
| should be linear (or less) with with the number of elements=20
| to be added to the first_ulist in the following folds.
|=20
|=20
| foldl (unionBy eq)  first_ulist list_of_lists
| foldr (unionBy' eq) first_ulist list_of_lists
|=20
| So, is there aother reason I cannot think of?  I'm sure I=20
| haven't covered all bases here.
|=20
| Thanks,
|=20
| Jay Cox
|=20
| _______________________________________________