[Haskell-cafe] sort and lazyness (?)
Daniel Kraft
d at domob.eu
Fri Dec 19 08:40:40 EST 2008
Hi,
I'm just a beginner trying to learn a little about Haskell, and as such
write some toy programs (e.g. for projecteuler.net) in Haskell.
Currently, I'm experiencing what I would call "strange behaviour":
I've got a data-type
data Fraction = Fraction Int Int
to hold rational numbers (maybe there's already some built-in type for
this in Haskell, much like for instance Scheme has a rational type?),
and then I compute a list of pairs of those numbers, that is
[(Fraction, Fraction)]. Fraction is declared an instance of Ord.
This list has up to 3 million elements. If I do
main = print $ length $ points
then the program prints out the length fine and takes around 6s to
finish (compiled with GHC -O3). Ok, but I acknowledge that length isn't
quite an expensive function, as I can imagine that Haskell does not
compute and hold the entire list for this but instead each element at
once and discards it afterwards.
Doing
main = print $ length $ map (\(x, _) -> x == Fraction 1 2) points
instead, gives a slightly longer runtime (6.6s), but in this case I'm
sure that at least each element is computed; right?
main = print $ length $ reverse points
gives 11.9s, and here I guess (?) that for this to work, the entire list
is computed and hold in memory.
However, trying to do
import List
main = print $ length $ sort points
makes memory usage go up and the program does not finish in 15m, also
spending most time waiting for swapped out memory. What am I doing
wrong, why is sort this expensive in this case? I would assume that
computing and holding the whole list does not take too much memory,
given its size and data type; doing the very same calculation in C
should be straight forward. And sort should be O(n * log n) for time
and also not much more expensive in memory, right?
Am I running into a problem with lazyness? What can I do to avoid it?
As far as I see it though, the reverse or map call above should do
nearly the same as sort, except maybe that the list needs to be stored
in memory as a whole and sort has an additional *log n factor, but
neither of those should matter. What's the problem here?
Is this something known with sort or similar functions? I couldn't find
anything useful on Google, though. My code is below, and while I would
of course welcome critics, I do not want to persuade anyone to read
through it.
Thanks a lot,
Daniel
----------------------------------------------------------
-- Problem 165: Intersections of lines.
import List
-- The random number generator.
seeds :: [Integer]
seeds = 290797 : [ mod (x * x) 50515093 | x <- seeds ]
numbers :: [Int]
numbers = map fromInteger (map (\x -> mod x 500) (tail seeds))
-- Line segments, vectors and fractions.
data Segment = Segment Int Int Int Int
data Vector = Vector Int Int
data Fraction = Fraction Int Int
instance Eq Fraction where
(Fraction a b) == (Fraction c d) = (a == c && b == d)
instance Ord Fraction where
compare (Fraction a b) (Fraction c d)
| (a == c && b == d) = EQ
| b > 0 = if a * d < b * c then LT else GT
| otherwise = if a * d > b * c then LT else GT
-- Build a normalized fraction and get its value.
normalize (Fraction a b) = let g = gcd a b;
aa = div a g;
bb = div b g in
if bb < 0
then Fraction (-aa) (-bb)
else Fraction aa bb
-- Find the inner product of two vectors.
innerProduct (Vector a b) (Vector c d) = a * c + b * d
-- Find the normal vector and direction of a line segment, as well as
-- the constant in straight-normal form for a given normal vector.
normalVector (Segment x1 y1 x2 y2) = Vector (y1 - y2) (x2 - x1)
direction (Segment x1 y1 x2 y2) = Vector (x2 - x1) (y2 - y1)
nfConstant (Segment x1 y1 x2 y2) n = innerProduct n (Vector x1 y1)
-- Check if a point is between the ends of the segment times D.
betweenEndsTimes d (Segment x1 y1 x2 y2) xD yD
= let x1D = x1 * d; x2D = x2 * d; y1D = y1 * d; y2D = y2 * d;
xDMin = min x1D x2D; xDMax = max x1D x2D;
yDMin = min y1D y2D; yDMax = max y1D y2D in
(xDMin <= xD && xDMax >= xD && yDMin <= yD && yDMax >= yD
&& (x1D /= xD || y1D /= yD) && (x2D /= xD || y2D /= yD))
-- If they are not parallel, we can find their intersection point (at least,
-- the one it would be if both were straights). Then it is easy to check if
-- it is between the endpoints for both.
--
-- n1 * x + m1 * y = c1
-- n2 * x + m2 * y = c2
--
-- => x = (c1 * m2 - x2 * m1) / (n1 * m2 - n2 * m1)
-- => y = (n1 * c2 - n2 * c1) / (n1 * m2 - n2 * m1)
--
-- (Iff they are parallel, the determinant will be 0.)
trueIntersect s1 s2 = let (Vector n1 m1) = normalVector s1;
(Vector n2 m2) = normalVector s2;
c1 = nfConstant s1 (Vector n1 m1);
c2 = nfConstant s2 (Vector n2 m2);
d = n1 * m2 - n2 * m1;
xD = c1 * m2 - c2 * m1;
yD = n1 * c2 - n2 * c1 in
if d == 0
then Nothing
else
if (betweenEndsTimes d s1 xD yD)
&& (betweenEndsTimes d s2 xD yD)
then Just ((normalize $ Fraction xD d),
(normalize $ Fraction yD d))
else Nothing
-- Build list of segments.
takeEveryForth :: [Int] -> [Int]
takeEveryForth (a:_:_:_:t) = a : (takeEveryForth t)
n1 = numbers
n2 = tail n1
n3 = tail n2
n4 = tail n3
segments = [ Segment a b c d | ((a, b), (c, d))
<- zip (zip (takeEveryForth n1)
(takeEveryForth n2))
(zip (takeEveryForth n3)
(takeEveryForth n4)) ]
-- For the first 5000 segments, calculate intersections.
firstSegments = take 5000 segments
intersects :: [Maybe (Fraction, Fraction)]
intersects = findInters firstSegments []
where
findInters [] l = l
findInters (h:t) l = findInters t (addInters h t l)
where
addInters _ [] l = l
addInters e (h:t) l = addInters e t ((trueIntersect e h) : l)
getPoints :: [Maybe (Fraction, Fraction)] -> [(Fraction, Fraction)]
getPoints [] = []
getPoints (Nothing : t) = getPoints t
getPoints ((Just v) : t) = v : (getPoints t)
points = getPoints intersects
-- Main program.
main :: IO ()
main = print $ length $ reverse points
--main = print $ length $ map (\(x, _) -> x == Fraction 1 2) points
--main = print $ length $ sort points
More information about the Haskell-Cafe
mailing list