Proposal: make minimumBy/maximumBy go through foldl', not foldr1

Edward Kmett ekmett at gmail.com
Thu Feb 11 10:35:32 UTC 2016


The extra members that were added to the class weren't an exhaustive
selection -- it is more that the proponents that were pushing them forward
at the time stopped when they themselves got exhausted.

Having the option for tree-based vs. foldl' based accumulation would give
the best of both worlds in one respect: O(log n) time bounds for some
containers, much better constants for others.

The downside would be re-opening old wounds on the FTP front.

-Edward

On Sun, Feb 7, 2016 at 5:44 AM, Joachim Breitner <mail at joachim-breitner.de>
wrote:

> Dear List,
>
> the following discussion stalled:
>
> Am Donnerstag, den 29.10.2015, 17:34 +0200 schrieb Roman Cheplyaka:
> > I just realized (thanks to this reddit thread[1]) that minimumBy is now
> > implemented through foldr1. Previously (before FTP), minimumBy from
> Data.List
> > used foldl1, while minimumBy from Data.Foldable used foldr1.
> >
> > I think that the way these were merged was a mistake.
> >
> > Let's admit that the absolute majority of use cases for minimumBy are
> numbers,
> > for which the comparison functions are strict in both arguments. Even for
> > something like Bool, where comparison could potentially be in one of the
> > arguments, the standard compare method is derived and is strict.
> >
> > Now, for Foldable, there isn't necesarily a connection between the
> direction and
> > the way fold is performed. For example, for Data.Map.Map both folds are
> > tail-recursive. Yet foldr is non-tail-recursive at least for:
> >
> > - good old lists
> > - anything that uses the same convention for foldl/foldr as lists
> > - anything that defines its foldable instance through conversion to lists
> > - anything that uses the default implementation of foldr/foldr1
> >
> > I also put a simple microbenchmark[2] to show the difference in
> performance;
> > on it, the proposed implementation is more than 20x faster than the
> current one.
> >
> > [1]:
> https://www.reddit.com/r/haskell/comments/3qpefo/a_very_unfortunate_error_message/
> > [2]: http://lpaste.net/714261956202070016
>
> see https://mail.haskell.org/pipermail/libraries/2015-October/026430.html
>  and
> https://ghc.haskell.org/trac/ghc/ticket/10830.
>
> It seems we have the choice of either
>  1 keeping this as they are, with Data.List.maximumBy having changed
> behaviour
>    for the worse from 7.8 to 7.10
>  2 changing the behavior of Foldable.maximumBy (from foldr to foldl)
>    and changing the behavior of Data.List.maximumBy back to what it was
> before,
>    and arguably the “better” definition.
>  3 adding minimumBy/maximumBy as a class method and changing only the list
> instance.
>
> This also raises the question of whether using foldl over foldl' is the
> right thing to do, as I was under the impression that foldl will (at
> least for lists)  always just accumulate a large heap of thunks, and
> one would always want to use foldl' instead. I postulate that if foldl'
> had been in the report, then it would have been used for maximumBy.
>
> So this yields option 4:
>
>  4 changing maximumBy to use foldl'
>
>
> BTW, why is there no fold' method in Foldable that leaves the
> associativity (left, right, or mixed in case of a tree) to the Foldable
> instance, and just implements the “best” strictly accumulating fold for
> the given data structure?
>
>
> Greetings,
> Joachim
>
> --
> Joachim “nomeata” Breitner
>   mail at joachim-breitner.dehttp://www.joachim-breitner.de/
>   Jabber: nomeata at joachim-breitner.de  • GPG-Key: 0xF0FBF51F
>   Debian Developer: nomeata at debian.org
>
>
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/libraries/attachments/20160211/c12927b4/attachment-0001.html>


More information about the Libraries mailing list