[Haskell-cafe] Tree traversal / stack overflow
Matthew Eastman
mg.eastman at gmail.com
Wed May 20 00:02:17 EDT 2009
Hi,
I've been writing some code to calculate the stretch factor of a tree
of points. What it means is that for every node in a tree (lets call
it 'pivot'), I have to traverse the same tree (lets call each node
'current') and sum d_t(pivot, current) / d(pivot, current) for each
node, where d_t is the tree path distance between two nodes and d is
the Euclidean distance.
What I've been doing is traversing the tree, pivoting up each node
into the root position, and calculating the stretch factor for each
new tree.
ex. +===+ +===+ +===+
| 1 | | 2 | | 3 |
+===+ .---+===+----. +===+
/ / | | \ \
+---+ +---+ +---+ +---+ +---+ +---+
| 2 | ---> | 3 | | 4 | | 5 | | 1 | ---> | 2
| etc.
.+---+. +---+ +---+ +---+ +---+ .+---+.
/ | \ / | \
+---+ +---+ +---+ +---+ +---+
+---+
| 3 | | 4 | | 5 | | 4 | | 5 |
| 1 |
+---+ +---+ +---+ +---+ +---+
+---+
The code I have right now works great on point sets of size ~10,000,
but when I go up much higher things start to go wrong. I start hitting
stack overflows, so I increased the stack size. For ~100,000 points
though, running the code with +RTS -sstderr -K128m it chugged along
for over an hour and then died with a stack overflow. The stats said
it spent 50 seconds MUT time and 5300 seconds (almost 90 minutes!) GC
time, which seems odd.
The performance has been really good for lower numbers of points, but
the professor I'm working for wants to handle over a million points
later on. I've only been writing Haskell for a year, and I'm not quite
sure how to rewrite this so that it won't blow the stack, since I'm
pretty sure this kind of tree traversal can't be done with tail calls
(I would love to proved wrong, though!). Any help would be appreciated.
Thanks,
Matt
module ASF (
averageStretchFactor
) where
import Data.Tree
import Data.Foldable (foldr')
type Point = (Double,Double)
square :: Double -> Double
square x = x * x
dist :: Point -> Point -> Double
dist (x1,y1) (x2,y2) = sqrt (square (x2 - x1) + square (y2 - y1))
add :: Tree a -> Tree a -> Tree a
add (Node p sts) t = Node p (t:sts)
-- Calculate the average stretch factor of a tree of size n. It's the
sum of
-- the average stretch factor of each node in the tree, divided by n
choose 2
-- (the number of possible pairs of points in in a tree of size n)
averageStretchFactor :: Tree Point -> Double -> Double
averageStretchFactor tree n = stretchRotations tree / (n * (n - 1) / 2)
-- Calculate the stretch factor of a tree.
-- The stretch of two points is the tree path distance between the two
points
-- divided by the euclidean distance between the two points. The
stretch factor
-- of a tree is the sum of the stretches between the root and every
other node
-- in the tree.
stretchFactor :: Tree Point -> Double
stretchFactor (Node point sts) = stretch 0 point sts
where
stretch _ _ [] = 0
stretch d p ((Node p' sts'):ts) = pd / ed + stretch pd p' sts' +
stretch d p ts
where
pd = d + dist p p' -- path distance
ed = dist point p' -- euclidean distance
-- Calculate the stretch factor of every point by pulling up each node
in the
-- tree to the root position. Note that the overall structure of the
tree
-- doesn't change, we're essentially just traversing the tree and
calculating
-- the stretch factor of each node by pretending we're at the root.
stretchRotations :: Tree Point -> Double
stretchRotations tree = rotate tree []
where
rotate tree@(Node p sts) path = stretchFactor (foldr' add tree
path) + pivot [] sts path p
pivot _ [] _ _ = 0
pivot ls (r:rs) path p = rotate r (Node p (ls ++ rs) : path) +
pivot (r:ls) rs path p
./main 10000 +RTS -sstderr -K128m
ASF = 22.441
1,298,891,896 bytes allocated in the heap
20,191,904 bytes copied during GC
3,107,116 bytes maximum residency (7 sample(s))
47,480 bytes maximum slop
8 MB total memory in use (0 MB lost due to
fragmentation)
Generation 0: 2471 collections, 0 parallel, 0.06s, 0.07s
elapsed
Generation 1: 7 collections, 0 parallel, 0.03s, 0.03s
elapsed
INIT time 0.00s ( 0.00s elapsed)
MUT time 13.05s ( 13.18s elapsed)
GC time 0.09s ( 0.11s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 13.14s ( 13.29s elapsed)
%GC time 0.7% (0.8% elapsed)
Alloc rate 99,506,082 bytes per MUT second
Productivity 99.3% of total user, 98.2% of total elapsed
./main 100000 +RTS -sstderr -K128m
Stack space overflow: current size 128000000 bytes.
Use `+RTS -Ksize' to increase it.
33,097,322,168 bytes allocated in the heap
3,004,248,280 bytes copied during GC
704,391,196 bytes maximum residency (29 sample(s))
67,420,196 bytes maximum slop
1731 MB total memory in use (14 MB lost due to
fragmentation)
Generation 0: 62561 collections, 0 parallel, 5364.16s, 5619.83s
elapsed
Generation 1: 29 collections, 0 parallel, 4.50s, 48.73s
elapsed
INIT time 0.00s ( 0.00s elapsed)
MUT time 49.85s ( 78.94s elapsed)
GC time 5368.66s (5668.56s elapsed)
EXIT time 0.00s ( 7.88s elapsed)
Total time 5418.51s (5747.56s elapsed)
%GC time 99.1% (98.6% elapsed)
Alloc rate 663,930,946 bytes per MUT second
Productivity 0.9% of total user, 0.9% of total elapsed
More information about the Haskell-Cafe
mailing list