Please apply the comparison function given to nubBy to elements of the list in the order in which they occur in the list.

Cale Gibbard cgibbard at gmail.com
Tue Sep 20 09:19:05 CEST 2011


Except that I didn't really say anything about evaluation order, I
said something about the semantics of the higher order function, and
what I wanted those semantics to be.

Now, of course, the Report definition leaves it completely ambiguous
what set of tests nubBy will perform when applied to some equivalence
relation, and there's a fair amount of wiggle room in how it gets
implemented because of that. Maybe some implementations are a little
more efficient than others, or perform fewer tests.

But that's beside the point, because I want to apply nubBy to
non-equivalence relations and have it do something sensible, because
the Report sample implementation of nubBy is actually *useful* when
applied to more general relations. Maybe not as frequently as
something like foldr, but I have a case where I want to apply it to
something which isn't an equivalence relation come up maybe every
couple of months, and it's really frustrating that it's undefined.

It's similar to only having a 'fold' function which is formally
undefined for non-associative operators, and not being able to rely on
the association that it gives you. Sure, there might be cases where
that actually helps a lot with performance (it's certainly crucial for
parallelism), but then there will also be cases where that is totally
inappropriate and ensures that you can't really use the function for
your task.

The one which is in GHC right now happens to be exactly what I want
every time that the need arises, except that in the current version,
it flips the order of arguments to the predicate, which seems to
indicate that it should not hurt very badly to insert a 'flip' into
its definition.

An implementation of nubBy which works for an arbitrary equivalence
relation and doesn't require any additional assumptions is going to be
quadratic in any case. You're not going to do better than that,
because in the worst case (a list with all elements non-equivalent)
you have to examine every unordered pair of elements of the list. So
you're at most talking about a constant factor or special case
performance gain, at the cost of not specifying semantics when it
would be useful to be able to rely on them.

On 8 September 2011 04:03, Ramin Honary <ramin.honary at gmail.com> wrote:
> Greetings,
> It might be nice to have "nubBy" work in a way that is more intuitive to
> computer scientists who expect list evaluation to work in a specific
> order. Unfortunately, Haskell is quite explicit about not specifying the
> order of evaluation, which can make Haskell more intuitive for
> mathematicians, and less intuitive for most other people.
> I don't work on GHC or on the Haskell language committee, but my
> understanding is that making the "nubBy" function undefined for operations
> that do not test for equality is a simplifying assumption that allows more
> freedom for evaluation and optimization. Here is an overly-simple example,
> but I hope it makes sense:
> a = nubBy (==) ([10-5] ++ takeWhile (<5) [0..20])
> b = nubBy (==) (nubBy (==) [5] ++ takeWhile (<5) (nubBy (==) [0..20]))
> According to Haskell, both 'a' and 'b' are mathematically equivalent,
> because "nubBy" is a distributive and associative function. This implies
> that if the compiler can somehow produce more efficient code by first
> converting 'a' into 'b' and then applying optimization, it should have the
> freedom to do so, and laziness guarantees that freedom. This is a poor
> example because obviously 'b' couldn't possibly be easier to optimize than
> 'a'. But really, who can fathom the logic of those crazy programmers who
> implemented the compiler with their ridiculous (but somehow always optimal)
> optimization strategies?
> If you require interpretation to go by list order, then you also must
> eliminate the distributive and associative properties of the "nubBy"
> function. By declaring "nubBy" only work on equality operations, you
> guarantee that it is associative and distributive across lists, and this
> allows a host of optimization strategies to be used which would otherwise be
> impossible if list-order application were required.
> If list order is important for you, it is easy enough to define your own
> "nubBy" function that is not distributive or associative, and can be
> therefore optimized differently than when you use "Data.List.nubBy".
> This blog post:
> <http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html> is
> about C, but the principles are the same: it has a fantastic explanation
> about how "undefined behavior" can be really helpful with simplifying
> compiler implementation and optimization.
> I hope that makes sense, and if I said anything inaccurate, I am at the
> mercy of the Haskell-prime mailing list.
>
> On Thu, Sep 8, 2011 at 9:07 AM, Cale Gibbard <cgibbard at gmail.com> wrote:
>>
>> I just tried this in ghci-7.0.3:
>>
>> ghci> nubBy (>=) [1,2,3,4]
>> [1]
>>
>> Think about what this is doing: it is excluding 2 from the list
>> because 2 >= 1, rather than including it because 1 >= 2 fails.
>>
>> I think an important convention when it comes to higher order
>> functions on lists is that to the extent which is possible, the
>> function parameters take elements from the list (or things computed
>> from those) in the order in which they occur in the original list.
>>
>> If we reimplement it in the obvious way:
>> ghci> let nubBy f [] = []; nubBy f (x:xs) = x : filter (not . f x) (nubBy
>> f xs)
>> ghci> nubBy (>=) [1,2,3,4]
>> [1,2,3,4]
>>
>> I'm aware that the Report (strangely!) explicitly leaves the behaviour
>> of nubBy unspecified for functions which are not equivalence
>> relations, but the behaviour given by the Report implementation (the
>> opposite of the current behaviour in GHC) is useful and desirable
>> nonetheless.
>>
>> I'm sure I've written about this before. I'm not entirely sure what
>> happened to the previous thread of discussion about this, but it just
>> came up again for me, and I decided that I was sufficiently irritated
>> by it to post again.
>>
>> Another thing perhaps worth pointing out is that the parameters to
>> mapAccumR have always been backwards (compare it with foldr). Few
>> enough people use this function that I'm fairly sure we could just
>> change it without harm.
>>
>>  - Cale
>>
>> _______________________________________________
>> Haskell-prime mailing list
>> Haskell-prime at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-prime
>
>



More information about the Libraries mailing list