[Haskell-cafe] Tree traversal / stack overflow

Matthew Eastman mg.eastman at gmail.com
Wed May 20 00:02:17 EDT 2009


Hi,

I've been writing some code to calculate the stretch factor of a tree  
of points. What it means is that for every node in a tree (lets call  
it 'pivot'), I have to traverse the same tree (lets call each node  
'current') and sum d_t(pivot, current) / d(pivot, current) for each  
node, where d_t is the tree path distance between two nodes and d is  
the Euclidean distance.

What I've been doing is traversing the tree, pivoting up each node  
into the root position, and calculating the stretch factor for each  
new tree.

ex.          +===+                 +===+                  +===+
              | 1 |                 | 2 |                  | 3 |
              +===+             .---+===+----.             +===+
             /                 /    |   |     \                 \
        +---+             +---+  +---+  +---+  +---+             +---+
        | 2 |     --->    | 3 |  | 4 |  | 5 |  | 1 |     --->    | 2  
|   etc.
       .+---+.            +---+  +---+  +---+  +---+            .+---+.
      /   |   \                                                /   |   \
+---+  +---+  +---+                                      +---+  +---+   
+---+
| 3 |  | 4 |  | 5 |                                      | 4 |  | 5 |   
| 1 |
+---+  +---+  +---+                                      +---+  +---+   
+---+


The code I have right now works great on point sets of size ~10,000,  
but when I go up much higher things start to go wrong. I start hitting  
stack overflows, so I increased the stack size. For ~100,000 points  
though, running the code with +RTS -sstderr -K128m it chugged along  
for over an hour and then died with a stack overflow. The stats said  
it spent 50 seconds MUT time and 5300 seconds (almost 90 minutes!) GC  
time, which seems odd.

The performance has been really good for lower numbers of points, but  
the professor I'm working for wants to handle over a million points  
later on. I've only been writing Haskell for a year, and I'm not quite  
sure how to rewrite this so that it won't blow the stack, since I'm  
pretty sure this kind of tree traversal can't be done with tail calls  
(I would love to proved wrong, though!). Any help would be appreciated.

Thanks,
Matt

module ASF (
     averageStretchFactor
) where

import Data.Tree
import Data.Foldable (foldr')

type Point = (Double,Double)

square :: Double -> Double
square x = x * x

dist :: Point -> Point -> Double
dist (x1,y1) (x2,y2) = sqrt (square (x2 - x1) + square (y2 - y1))

add :: Tree a -> Tree a -> Tree a
add (Node p sts) t = Node p (t:sts)

-- Calculate the average stretch factor of a tree of size n. It's the  
sum of
-- the average stretch factor of each node in the tree, divided by n  
choose 2
-- (the number of possible pairs of points in in a tree of size n)
averageStretchFactor :: Tree Point -> Double -> Double
averageStretchFactor tree n = stretchRotations tree / (n * (n - 1) / 2)

-- Calculate the stretch factor of a tree.
-- The stretch of two points is the tree path distance between the two  
points
-- divided by the euclidean distance between the two points. The  
stretch factor
-- of a tree is the sum of the stretches between the root and every  
other node
-- in the tree.
stretchFactor :: Tree Point -> Double
stretchFactor (Node point sts) = stretch 0 point sts
   where
     stretch _ _ [] = 0
     stretch d p ((Node p' sts'):ts) = pd / ed + stretch pd p' sts' +  
stretch d p ts
       where
         pd = d + dist p p' -- path distance
         ed = dist point p' -- euclidean distance

-- Calculate the stretch factor of every point by pulling up each node  
in the
-- tree to the root position. Note that the overall structure of the  
tree
-- doesn't change, we're essentially just traversing the tree and  
calculating
-- the stretch factor of each node by pretending we're at the root.
stretchRotations :: Tree Point -> Double
stretchRotations tree = rotate tree []
   where
     rotate tree@(Node p sts) path = stretchFactor (foldr' add tree  
path) + pivot [] sts path p
     pivot _  []     _    _ = 0
     pivot ls (r:rs) path p = rotate r (Node p (ls ++ rs) : path) +  
pivot (r:ls) rs path p


./main 10000 +RTS -sstderr -K128m
ASF = 22.441
    1,298,891,896 bytes allocated in the heap
       20,191,904 bytes copied during GC
        3,107,116 bytes maximum residency (7 sample(s))
           47,480 bytes maximum slop
                8 MB total memory in use (0 MB lost due to  
fragmentation)

   Generation 0:  2471 collections,     0 parallel,  0.06s,  0.07s  
elapsed
   Generation 1:     7 collections,     0 parallel,  0.03s,  0.03s  
elapsed

   INIT  time    0.00s  (  0.00s elapsed)
   MUT   time   13.05s  ( 13.18s elapsed)
   GC    time    0.09s  (  0.11s elapsed)
   EXIT  time    0.00s  (  0.00s elapsed)
   Total time   13.14s  ( 13.29s elapsed)

   %GC time       0.7%  (0.8% elapsed)

   Alloc rate    99,506,082 bytes per MUT second

   Productivity  99.3% of total user, 98.2% of total elapsed


./main 100000 +RTS -sstderr -K128m
Stack space overflow: current size 128000000 bytes.
Use `+RTS -Ksize' to increase it.
   33,097,322,168 bytes allocated in the heap
    3,004,248,280 bytes copied during GC
      704,391,196 bytes maximum residency (29 sample(s))
       67,420,196 bytes maximum slop
             1731 MB total memory in use (14 MB lost due to  
fragmentation)

   Generation 0: 62561 collections,     0 parallel, 5364.16s, 5619.83s  
elapsed
   Generation 1:    29 collections,     0 parallel,  4.50s, 48.73s  
elapsed

   INIT  time    0.00s  (  0.00s elapsed)
   MUT   time   49.85s  ( 78.94s elapsed)
   GC    time  5368.66s  (5668.56s elapsed)
   EXIT  time    0.00s  (  7.88s elapsed)
   Total time  5418.51s  (5747.56s elapsed)

   %GC time      99.1%  (98.6% elapsed)

   Alloc rate    663,930,946 bytes per MUT second

   Productivity   0.9% of total user, 0.9% of total elapsed




More information about the Haskell-Cafe mailing list