[Haskell-cafe] will the real quicksort please stand up? (or: sorting a > million element list)

Tue Nov 20 14:12:40 EST 2007

[note, the thread is almost a month old]

Bernie Pope wrote:
>
> On 23/10/2007, at 8:09 AM, Thomas Hartman wrote:
>>
>> (Prelude sort, which I think is mergesort, just blew the stack.)
>
> GHC uses a "bottom up" merge sort these days.
>
> It starts off by creating a list of singletons, then it repeatedly merges 
> adjacent pairs of lists until there
> is only one list left.
>
> I was teaching sorting to my first year students, and I also bumped into 
> the stack overflow at one million elements, using GHC's merge sort.

I think I got to the bottom of this.

Consider the following snippet:

   sort $ (take (10^6) [1..])

The argument of the sort is a list with 10^6 unevaluated elements,
[a=1, b=1+a, c=1+b, d=1+c, ...]. Now it turns out that merge sort as
implemented in the base library compares the two last elements of the
list first. This causes the evaluation of an expression that is
approximately 10^6 applications of (+) deep. And that's where you
get the stack overflow. [1]

Thomas Hartman's example is of a similar nature, it also produces a list
of unevaluated terms where each term depends on the value of the previous
one.

The modification that I proposed in
  http://www.haskell.org/pipermail/haskell-cafe/2007-October/033617.html

has the effect of comparing the first two elements first. I actually
believe that this is a reasonable change, because it's more likely to
work out fine. But it'll produce a stack overflow on

  sort $ (reverse (take (10^6) [1..]))

instead, which doesn't cause problems currently. The root problem is
the creation of deep unevaluated expressions.

Bertram

[1] Note that  sort [1..10^6]  works just fine because  [1..10^6]
    produces a list of fully evaluated values, as it compares each
    list element to the upper bound when it is generated.