# Please apply the comparison function given to nubBy to elements of the list in the order in which they occur in the list.

Ramin Honary ramin.honary at gmail.com
Thu Sep 8 10:03:07 CEST 2011

```Greetings,

It might be nice to have "nubBy" work in a way that is more intuitive to
computer scientists who expect list evaluation to work in a specific
order of evaluation, which can make Haskell more intuitive for
mathematicians, and less intuitive for most other people.

I don't work on GHC or on the Haskell language committee, but my
understanding is that making the "nubBy" function undefined for operations
that do not test for equality is a simplifying assumption that allows more
freedom for evaluation and optimization. Here is an overly-simple example,
but I hope it makes sense:

a = nubBy (==) ([10-5] ++ takeWhile (<5) [0..20])
b = nubBy (==) (nubBy (==) [5] ++ takeWhile (<5) (nubBy (==) [0..20]))

According to Haskell, both 'a' and 'b' are mathematically equivalent,
because "nubBy" is a distributive and associative function. This implies
that if the compiler can somehow produce more efficient code by first
converting 'a' into 'b' and then applying optimization, it should have the
freedom to do so, and laziness guarantees that freedom. This is a poor
example because obviously 'b' couldn't possibly be easier to optimize than
'a'. But really, who can fathom the logic of those crazy programmers who
implemented the compiler with their ridiculous (but somehow always optimal)
optimization strategies?

If you require interpretation to go by list order, then you also must
eliminate the distributive and associative properties of the "nubBy"
function. By declaring "nubBy" only work on equality operations, you
guarantee that it is associative and distributive across lists, and this
allows a host of optimization strategies to be used which would otherwise be
impossible if list-order application were required.

If list order is important for you, it is easy enough to define your own
"nubBy" function that is not distributive or associative, and can be
therefore optimized differently than when you use "Data.List.nubBy".

This blog post: <
http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html> is
about C, but the principles are the same: it has a fantastic explanation
compiler implementation and optimization.

I hope that makes sense, and if I said anything inaccurate, I am at the
mercy of the Haskell-prime mailing list.

On Thu, Sep 8, 2011 at 9:07 AM, Cale Gibbard <cgibbard at gmail.com> wrote:

> I just tried this in ghci-7.0.3:
>
> ghci> nubBy (>=) [1,2,3,4]
> [1]
>
> Think about what this is doing: it is excluding 2 from the list
> because 2 >= 1, rather than including it because 1 >= 2 fails.
>
> I think an important convention when it comes to higher order
> functions on lists is that to the extent which is possible, the
> function parameters take elements from the list (or things computed
> from those) in the order in which they occur in the original list.
>
> If we reimplement it in the obvious way:
> ghci> let nubBy f [] = []; nubBy f (x:xs) = x : filter (not . f x) (nubBy f
> xs)
> ghci> nubBy (>=) [1,2,3,4]
> [1,2,3,4]
>
> I'm aware that the Report (strangely!) explicitly leaves the behaviour
> of nubBy unspecified for functions which are not equivalence
> relations, but the behaviour given by the Report implementation (the
> opposite of the current behaviour in GHC) is useful and desirable
> nonetheless.
>
> came up again for me, and I decided that I was sufficiently irritated
> by it to post again.
>
> Another thing perhaps worth pointing out is that the parameters to
> mapAccumR have always been backwards (compare it with foldr). Few
> enough people use this function that I'm fairly sure we could just
> change it without harm.
>
>  - Cale
>
> _______________________________________________