[Haskell-cafe] nubBy seems broken in recent GHCs

Cale Gibbard cgibbard at gmail.com
Tue Jun 9 15:03:18 EDT 2009


2009/6/6 Bertram Felgenhauer <bertram.felgenhauer at googlemail.com>:
> Interesting. This was changed in response to
>
>    http://hackage.haskell.org/trac/ghc/ticket/2528
>
> | Tue Sep  2 11:29:50 CEST 2008  Simon Marlow <marlowsd at gmail.com>
> |   * #2528: reverse the order of args to (==) in nubBy to match nub
> |   This only makes a difference when the (==) definition is not
> |   reflexive, but strictly speaking it does violate the report definition
> |   of nubBy, so we should fix it.
>
> It turns out that 'elem' differs from the report version and should
> have its comparison reversed. Of course that would only ever matter
> for broken Eq instances.
>
> However, the report also states that the nubBy function may assume that
> the given predicate defines an equivalence relation.
>
>    http://haskell.org/onlinereport/list.html#sect17.6
>
> So I'm not sure there's anything to be fixed here - although backing
> out the above patch probably won't hurt anybody.
>
> Bertram

Yeah, while most Eq instances really do define an equivalence relation
(at least extensionally), and the Report authors probably thought in
terms of an equivalence relation (even going so far as to say that
nubBy can assume its parameter is one, which I think was a mistake), I
think nubBy has a much more general use. It does a generalised kind of
sieving, specifically,

nubBy f xs is the unique subsequence of xs that:
1) Has the property that if x and y are elements such that x occurs
before y in it, then f x y is False.
2) The sequence of indices of selected elements is lexicographically
minimum for all subsequences satisfying condition 1. (That is, it
always picks the earliest elements possible.)

I think this is how it ought to be specified.

Similarly, groupBy f xs is (and should be) the unique list of
contiguous sublists of xs such that:
1) concat (groupBy f xs) = xs
2) If x is the head of any of the sublists and y is any other element
of that sublist, then f x y
3) The sequence of lengths of the sublists is lexicographically
maximum for all lists satisfying the first two properties (That is, it
always prefers adding elements to an earlier group to starting a new
group.)

 - Cale


More information about the Haskell-Cafe mailing list