# [Haskell-cafe] sort and lazyness (?)

Daniel Kraft d at domob.eu
Fri Dec 19 08:40:40 EST 2008

```Hi,

I'm just a beginner trying to learn a little about Haskell, and as such
write some toy programs (e.g. for projecteuler.net) in Haskell.

Currently, I'm experiencing what I would call "strange behaviour":

I've got a data-type

data Fraction = Fraction Int Int

to hold rational numbers (maybe there's already some built-in type for
this in Haskell, much like for instance Scheme has a rational type?),
and then I compute a list of pairs of those numbers, that is
[(Fraction, Fraction)].  Fraction is declared an instance of Ord.

This list has up to 3 million elements.  If I do

main = print \$ length \$ points

then the program prints out the length fine and takes around 6s to
finish (compiled with GHC -O3).  Ok, but I acknowledge that length isn't
quite an expensive function, as I can imagine that Haskell does not
compute and hold the entire list for this but instead each element at
once and discards it afterwards.

Doing

main = print \$ length \$ map (\(x, _) -> x == Fraction 1 2) points

instead, gives a slightly longer runtime (6.6s), but in this case I'm
sure that at least each element is computed; right?

main = print \$ length \$ reverse points

gives 11.9s, and here I guess (?) that for this to work, the entire list
is computed and hold in memory.

However, trying to do

import List
main = print \$ length \$ sort points

makes memory usage go up and the program does not finish in 15m, also
spending most time waiting for swapped out memory.  What am I doing
wrong, why is sort this expensive in this case?  I would assume that
computing and holding the whole list does not take too much memory,
given its size and data type; doing the very same calculation in C
should be straight forward.  And sort should be O(n * log n) for time
and also not much more expensive in memory, right?

Am I running into a problem with lazyness?  What can I do to avoid it?
As far as I see it though, the reverse or map call above should do
nearly the same as sort, except maybe that the list needs to be stored
in memory as a whole and sort has an additional *log n factor, but
neither of those should matter.  What's the problem here?

Is this something known with sort or similar functions?  I couldn't find
anything useful on Google, though.  My code is below, and while I would
of course welcome critics, I do not want to persuade anyone to read
through it.

Thanks a lot,
Daniel

----------------------------------------------------------

-- Problem 165: Intersections of lines.

import List

-- The random number generator.

seeds :: [Integer]
seeds = 290797 : [ mod (x * x) 50515093 | x <- seeds ]

numbers :: [Int]
numbers = map fromInteger (map (\x -> mod x 500) (tail seeds))

-- Line segments, vectors and fractions.

data Segment = Segment Int Int Int Int
data Vector = Vector Int Int

data Fraction = Fraction Int Int
instance Eq Fraction where
(Fraction a b) == (Fraction c d) = (a == c && b == d)
instance Ord Fraction where
compare (Fraction a b) (Fraction c d)
| (a == c && b == d) = EQ
| b > 0     = if a * d < b * c then LT else GT
| otherwise = if a * d > b * c then LT else GT

-- Build a normalized fraction and get its value.

normalize (Fraction a b) = let g = gcd a b;
aa = div a g;
bb = div b g in
if bb < 0
then Fraction (-aa) (-bb)
else Fraction aa bb

-- Find the inner product of two vectors.

innerProduct (Vector a b) (Vector c d) = a * c + b * d

-- Find the normal vector and direction of a line segment, as well as
-- the constant in straight-normal form for a given normal vector.

normalVector (Segment x1 y1 x2 y2) = Vector (y1 - y2) (x2 - x1)
direction (Segment x1 y1 x2 y2) = Vector (x2 - x1) (y2 - y1)
nfConstant (Segment x1 y1 x2 y2) n = innerProduct n (Vector x1 y1)

-- Check if a point is between the ends of the segment times D.

betweenEndsTimes d (Segment x1 y1 x2 y2) xD yD
= let x1D = x1 * d; x2D = x2 * d; y1D = y1 * d; y2D = y2 * d;
xDMin = min x1D x2D; xDMax = max x1D x2D;
yDMin = min y1D y2D; yDMax = max y1D y2D in
(xDMin <= xD && xDMax >= xD && yDMin <= yD && yDMax >= yD
&& (x1D /= xD || y1D /= yD) && (x2D /= xD || y2D /= yD))

-- If they are not parallel, we can find their intersection point (at least,
-- the one it would be if both were straights).  Then it is easy to check if
-- it is between the endpoints for both.
--
-- n1 * x + m1 * y = c1
-- n2 * x + m2 * y = c2
--
-- => x = (c1 * m2 - x2 * m1) / (n1 * m2 - n2 * m1)
-- => y = (n1 * c2 - n2 * c1) / (n1 * m2 - n2 * m1)
--
-- (Iff they are parallel, the determinant will be 0.)

trueIntersect s1 s2 = let (Vector n1 m1) = normalVector s1;
(Vector n2 m2) = normalVector s2;
c1 = nfConstant s1 (Vector n1 m1);
c2 = nfConstant s2 (Vector n2 m2);
d = n1 * m2 - n2 * m1;
xD = c1 * m2 - c2 * m1;
yD = n1 * c2 - n2 * c1 in
if d == 0
then Nothing
else
if (betweenEndsTimes d s1 xD yD)
&& (betweenEndsTimes d s2 xD yD)
then Just ((normalize \$ Fraction xD d),
(normalize \$ Fraction yD d))
else Nothing

-- Build list of segments.

takeEveryForth :: [Int] -> [Int]
takeEveryForth (a:_:_:_:t) = a : (takeEveryForth t)

n1 = numbers
n2 = tail n1
n3 = tail n2
n4 = tail n3

segments = [ Segment a b c d | ((a, b), (c, d))
<- zip (zip (takeEveryForth n1)
(takeEveryForth n2))
(zip (takeEveryForth n3)
(takeEveryForth n4)) ]

-- For the first 5000 segments, calculate intersections.

firstSegments = take 5000 segments

intersects :: [Maybe (Fraction, Fraction)]
intersects = findInters firstSegments []
where
findInters [] l = l
findInters (h:t) l = findInters t (addInters h t l)
where
addInters _ [] l = l
addInters e (h:t) l = addInters e t ((trueIntersect e h) : l)

getPoints :: [Maybe (Fraction, Fraction)] -> [(Fraction, Fraction)]
getPoints [] = []
getPoints (Nothing : t) = getPoints t
getPoints ((Just v) : t) = v : (getPoints t)

points = getPoints intersects

-- Main program.

main :: IO ()
main = print \$ length \$ reverse points
--main = print \$ length \$ map (\(x, _) -> x == Fraction 1 2) points
--main = print \$ length \$ sort points

```