# nubBy, groupBy specification

Cale Gibbard cgibbard at gmail.com
Sat Jan 2 19:03:52 EST 2010

2010/1/2  <a.biurvOir4 at asuhan.com>:
> I'm inclined to agree with you. Could you please post the sources to both nubBy
> and groupBy to see what has changed?
>
> Thanks,
> Kim-Ee
> yeoh at cs.wisc.edu

Sure thing,

Here is the report's implementation of nubBy:

-------------------
nubBy            :: (a -> a -> Bool) -> [a] -> [a]
nubBy eq []      =  []
nubBy eq (x:xs)  =  x : nubBy eq (filter (\y -> not (eq x y)) xs)
-------------------

And here is the implementation in GHC 6.10.4 and GHC 6.12.1:

-------------------
nubBy eq l              = nubBy' l []
where
nubBy' [] _         = []
nubBy' (y:ys) xs
| elem_by eq y xs = nubBy' ys xs
| otherwise       = y : nubBy' ys (y:xs)

-- Not exported:
-- Note that we keep the call to `eq` with arguments in the
-- same order as in the reference implementation
-- 'xs' is the list of things we've seen so far,
-- 'y' is the potential new element
elem_by :: (a -> a -> Bool) -> a -> [a] -> Bool
elem_by _  _ []         =  False
elem_by eq y (x:xs)     =  y `eq` x || elem_by eq y xs
-------------------

Note that the comment is actually incorrect. The report implementation
of nubBy applies the function such that earlier accepted elements of
the list are the left parameter to eq. Simply switching the  y `eq` x
to become  x `eq` y in elem_by produces something which passes
QuickCheck tests at the very least. I believe it's really the right
thing at that point.

The implementation of groupBy in 6.10.4 and 6.12.1 is the same as the
report version, and I'm happy with it:

-------------------
groupBy                 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _  []           =  []
groupBy eq (x:xs)       =  (x:ys) : groupBy eq zs
where (ys,zs) = span (eq x) xs
-------------------

My main concern with it is that the report says:

-------------------
When the "By" function replaces an Eq context by a binary predicate,
the predicate is assumed to define an equivalence
-------------------

which I think is much too strong an assumption given that these
functions have reasonably natural and quite useful extensions to
non-equivalence relations, which the code provided by the Report
actually implements.

- Cale