# [Haskell-cafe] natural mergesort in Data.List

David Feuer david.feuer at gmail.com
Tue Sep 18 14:58:24 UTC 2018

```It may. Or it may not. The performance impact of such small changes can be
hard to predict.

On Tue, Sep 18, 2018, 10:38 AM Christian Sternagel <c.sternagel at gmail.com>
wrote:

> Dear David,
>
> thanks for the pointer.
>
> Btw: I was able to modify my complexity proof so that "sequences" is no
> longer required to only contain non-empty lists. Sorry for the noise.
>
> But maybe such a "sequences" is not entirely useless: Did I understand
> correctly that what you are saying is that knowing that all elements in
> "sequences" are non-empty lists might have some impact on performance
> due to being able to use "NonEmpty a"?
>
> cheers
>
> chris
>
> On 09/18/2018 03:18 PM, David Feuer wrote:
> > data NonEmpty a = a :| [a]
> >
> > It's a nonempty list, defined in Data.List.NonEmpty.
> >
> > On Tue, Sep 18, 2018, 9:14 AM Christian Sternagel <c.sternagel at gmail.com
> > <mailto:c.sternagel at gmail.com>> wrote:
> >
> >     Dear David,
> >
> >     I am guaranteeing (since I proved it in Isabelle/HOL) that the
> following
> >     version of "sequences" does not contain empty lists in its result (I
> am
> >     copying from my Isabelle formalization, in order to avoid typos) ;)
> >
> >     fun sequences :: "'a list ⇒ 'a list list"
> >       and asc :: "'a ⇒ ('a list ⇒ 'a list) ⇒ 'a list ⇒ 'a list list"
> >       and desc :: "'a ⇒ 'a list ⇒ 'a list ⇒ 'a list list"
> >       where
> >         "sequences (a # b # xs) =
> >           (if key a > key b then desc b [a] xs else asc b ((#) a) xs)"
> >       | "sequences [x] = [[x]]"
> >       | "sequences [] = []"
> >       | "asc a as (b # bs) =
> >           (if ¬ key a > key b then asc b (λys. as (a # ys)) bs
> >           else as [a] # sequences (b # bs))"
> >       | "asc a as [] = [as [a]]"
> >       | "desc a as (b # bs) =
> >           (if key a > key b then desc b (a # as) bs
> >           else (a # as) # sequences (b # bs))"
> >       | "desc a as [] = [a # as]"
> >
> >     The "key" function is an implicit first parameter for each of
> >     "sequences", "asc", and "desc" above. The fact that I am using a
> "key"
> >     function instead of a comparator is due to Isabelle/HOL's standard
> >     library. Also, there are no pattern guards in Isabelle/HOL. Anyway,
> it
> >     should be relatively straight-forward to translate these functions
> into
> >
> >     Another thing: I just realized that "merge_pairs" in my formalization
> >     also differs from "mergePairs", since with the changed "sequences" it
> >     might actually get an empty list as input, in which case the current
> >     "mergePairs" wouldn't terminate at all.
> >
> >     So for those who are interested, the full definition of mergesort
> can be
> >     found here
> >
> >
> >
> https://devel.isa-afp.org/browser_info/current/AFP/Efficient-Mergesort/Efficient_Mergesort.html
> >
> >     where mergesort is called "msort_key".
> >
> >     cheers
> >
> >     chris
> >
> >     Btw: What is "NonEmpty x"?
> >
> >     On 09/18/2018 11:55 AM, David Feuer wrote:
> >     > If you're guaranteeing that the result won't contain empty lists,
> it
> >     > would be worth benchmarking the effect of using NonEmpty x instead
> of
> >     > [x] in that spot.
> >     >
> >     > On Tue, Sep 18, 2018, 4:21 AM Christian Sternagel
> >     <c.sternagel at gmail.com <mailto:c.sternagel at gmail.com>
> >     > <mailto:c.sternagel at gmail.com <mailto:c.sternagel at gmail.com>>>
> wrote:
> >     >
> >     >     Dear Cafe,
> >     >
> >     >     some years ago I formalized the mergesort implementation [1]
> from
> >     >
> >     >
> >     >
> >
> >     >
> >     >     (as far as I can tell it didn't change since 2012) in the proof
> >     >     assistant Isabelle/HOL [2].
> >     >
> >     >     More specifically, I proved its functional correctness (the
> >     result is
> >     >     sorted and contains all elements of the input with exactly the
> >     same
> >     >     multiplicities) and that it is a stable sorting algorithm.
> >     >
> >     >     Very recently I also formalized a complexity result in
> >     Isabelle/HOL,
> >     >     namely that the number of comparisons is bounded from above by
> >     >
> >     >       n + n * ⌈log 2 n⌉
> >     >
> >     >     for lists of length n.
> >     >
> >     >     For this proof I had to change the definition of "sequences",
> >     >     "ascending", and "descending" slightly.
> >     >
> >     >     Now here is my question: Does anyone now of reasons why the
> >     current
> >     >     implementation of "sequences" is allowed to produce spurious
> >     empty lists
> >     >     in its result? The version I used in my formalization only
> >     differs in
> >     >     the following three spots:
> >     >
> >     >       sequences [x] = [[x]] -- this is the only important change,
> >     since
> >     >       sequences [] = []     -- then the result does not contain
> >     empty lists
> >     >
> >     >
> >     >       sequences xs = [xs]
> >     >
> >     >     and
> >     >
> >     >       ascending a as [] = let !x = as [a] in [x]
> >     >
> >     >
> >     >       ascending a as bs = let !x = as [a] in x : sequences bs
> >     >
> >     >     and
> >     >
> >     >       descending a as []  = [a:as]
> >     >
> >     >
> >     >       descending a as bs = (a:as) : sequences bs
> >     >
> >     >     [1] https://www.isa-afp.org/entries/Efficient-Mergesort.html
> >     >     [2] http://isabelle.in.tum.de/
> >     >     _______________________________________________
> >     >     Haskell-Cafe mailing list
> >     >     To (un)subscribe, modify options or view archives go to: