[Haskell-cafe] sort and lazyness (?)

Fri Dec 19 09:37:29 EST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Dec 19, 2008, at 7:40 AM, Daniel Kraft wrote:

> data Fraction = Fraction Int Int
>
> to hold rational numbers (maybe there's already some built-in type  
> for this in Haskell, much like for instance Scheme has a rational  
> type?),

There is one. It is called Rational. :)

> and then I compute a list of pairs of those numbers, that is
> [(Fraction, Fraction)].  Fraction is declared an instance of Ord.
>
> This list has up to 3 million elements.  If I do
>
> main = print $ length $ points
>
> then the program prints out the length fine and takes around 6s to  
> finish (compiled with GHC -O3).  Ok, but I acknowledge that length  
> isn't quite an expensive function, as I can imagine that Haskell  
> does not compute and hold the entire list for this but instead each  
> element at once and discards it afterwards.

Unless something has changed since I last checked, there is little  
difference between ghc -O2 and ghc -O3, and the latter can even be  
slower much of the time. It may depend on the situation, but I'm just  
letting you know.

In this case, the runtime should not actually be computing _any_ of  
the elements of the list since it doesn't care what their values are.  
You are only counting them, not using them, so it really only computes  
the spine of the list.

> Doing
>
> main = print $ length $ map (\(x, _) -> x == Fraction 1 2) points
>
> instead, gives a slightly longer runtime (6.6s), but in this case  
> I'm sure that at least each element is computed; right?

No. The elements' values are still not actually used, so you are still  
only evaluating the spine of the list.

> main = print $ length $ reverse points
>
> gives 11.9s, and here I guess (?) that for this to work, the entire  
> list is computed and hold in memory.

This is beginning to sound like you think Haskell lists are arrays.  
They aren't. They are actually linked lists. In order to reverse a  
linked list, you have to travel all the way to the end and then  
reconstruct it from there. This explains the time increase. It doesn't  
really have anything to do with computing the elements or holding  
anything in memory.

> However, trying to do
>
> import List
> main = print $ length $ sort points
>
> makes memory usage go up and the program does not finish in 15m,  
> also spending most time waiting for swapped out memory.  What am I  
> doing wrong, why is sort this expensive in this case?  I would  
> assume that computing and holding the whole list does not take too  
> much memory, given its size and data type; doing the very same  
> calculation in C should be straight forward.  And sort should be O(n  
> * log n) for time and also not much more expensive in memory, right?

This is actually the first time you really evaluated any of the  
elements of the list, hence a massive increase in memory use.

- - Jake
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAklLsaoACgkQye5hVyvIUKlJYgCfYtwPv/neOnl3+wIu8VhIqfoA
lXMAn3EmANZbSRyYeOiXtOdGl7hxCj34
=NJwT
-----END PGP SIGNATURE-----