[Haskell-cafe] Profiling help (Warning: Euler spoilers)

James Jackson j.e.m.jackson at gmail.com
Wed Mar 3 21:47:13 EST 2010


I have written the program below, which solves problem 14 from Project
Euler, which asks us to find the hailstone number below 1 million that
takes the longest to get to 1. The program solves the problem using
dynamic programming using a Data.Map.  It completes in under a minute
(barely), but allocates a ridiculous amount of memory and actually
runs slower than the naive one-liner it was supposed to replace.  I
have an intuition on what the problem is, but would like to verify it
by profiling.

Unfortunately, I don't know enough about how profiling works in GHC to
diagnose the problem.  Hopefully the Cafe can help.


Here's the code.

scratch.hs: This is the main program, which just calls the
problemFourteen function.

The EP datatype just contains an int representing the problem number
and an (IO ()) which prints out the solution

---------------------------------------------------------------------------------------------------------------------------
import Euler1to20 -- contains problems 1 through 20
import Euler -- Contains various utility functions including the EC
type which represents Euler Problems
main =  proc
   where (EP num proc) = problemFourteen
---------------------------------------------------------------------------------------------------------------------------

Euler1to20.hs:  contains the following lines, among others
---------------------------------------------------------------------------------------------------------------------------
problemFourteen = EP 14 $ print $ (key, fromJust $ findIndex (==key) keys)
   where (keys, results) = runState (mapM findChainLength [1..10^6])
                           $ M.fromList [(1,1)]
         key = maximum keys

findChainLength::Integer->State (M.Map Integer Integer) Integer
findChainLength n = do
 m <- get
 case M.lookup n m of
   Nothing -> let nextLink = if even n then div n 2 else 3*n+1
              in do
                len <- liftM (1+) (findChainLength nextLink)
                m2 <- get
                m2 `seq` put (M.insert n len m2)
                return len
   Just n'-> return n'
---------------------------------------------------------------------------------------------------------------------------

My guess is that the problem is in the recursive call to
findChainLength, which is not in tail position.  I think that this is
causing the code to keep too many old copies of the map around.
Before I go rewriting everything, however, I'd like to verify that
this is indeed the problem, so the next step is to compile and run
with profiling turned on.  After quickly looking at the Real World
Haskell chapter on profiling,  I do the following.

Compiling and running:

---------------------------------------------------------------------------------------------------------------------------
ghc -prof --make -auto-all -fforce-recomp scratch.hs
[1 of 3] Compiling Euler            ( Euler.hs, Euler.o )
[2 of 3] Compiling Euler1to20       ( Euler1to20.hs, Euler1to20.o )
[3 of 3] Compiling Main             ( scratch.hs, scratch.o )
Linking scratch ...

./scratch +RTS -sstderr -hc -p -K100M
(525,837798)
9,503,907,112 bytes allocated in the heap
42,785,764,376 bytes copied during GC (scavenged)
15,697,610,688 bytes copied during GC (not scavenged)
421,548,032 bytes maximum residency (193 sample(s))

     17787 collections in generation 0 (650.00s)
       193 collections in generation 1 (1748.37s)

       782 Mb total memory in use

 INIT  time    0.00s  (  0.00s elapsed)
 MUT   time   18.68s  ( 19.19s elapsed)
 GC    time  2398.37s  (2401.47s elapsed)
 RP    time    0.00s  (  0.00s elapsed)
 PROF  time   50.48s  ( 50.50s elapsed)
 EXIT  time    0.00s  (  0.00s elapsed)
 Total time  2467.53s  (2471.17s elapsed)

 %GC time      97.2%  (97.2% elapsed)

 Alloc rate    508,742,769 bytes per MUT second

 Productivity   0.8% of total user, 0.8% of total elapsed

---------------------------------------------------------------------------------------------------------------------------
I can see from the above that the maximum resident memory at one time
is 782Mb, which seems excessive, since I don't expect the Map to
contain more than around 2M entries. Also, I had to use "-K 100M" to
increase the stack space or the program bombs out with a stack
overflow exception.

Since I compiled with profiling information, running the program
produced the following scratch.prof:

---------------------------------------------------------------------------------------------------------------------------
       Wed Mar  3 19:41 2010 Time and Allocation Profiling Report  (Final)

          scratch +RTS -sstderr -hc -p -K100M -RTS

       total time  =       18.68 secs   (934 ticks @ 20 ms)
       total alloc = 5,327,238,992 bytes  (excludes profiling overheads)

COST CENTRE                    MODULE               %time %alloc

findChainLength                Euler1to20            93.8   90.5
CAF                            Euler1to20             5.2    8.0
problemFourteen                Euler1to20             1.0    1.5



                       individual    inherited
COST CENTRE              MODULE
      no.    entries  %time %alloc   %time %alloc

MAIN                     MAIN
        1           0   0.0    0.0   100.0  100.0
 CAF                     Main
      212           2   0.0    0.0     0.0    0.0
 main                   Main
      218           1   0.0    0.0     0.0    0.0
 CAF                     GHC.Handle
      164           4   0.0    0.0     0.0    0.0
 CAF                     Euler1to20
      139          21   5.2    8.0   100.0  100.0
 findChainLength        Euler1to20
      220     3168610  93.8   90.5    93.8   90.5
 problemFourteen        Euler1to20
      219           1   1.0    1.5     1.0    1.5
 CAF                     Control.Monad.State.Lazy
      137           1   0.0    0.0     0.0    0.0
---------------------------------------------------------------------------------------------------------------------------

The above tells me that almost all the CPU time and memory allocation
happens in findChainLength, but that isn't really a surprise.
I think the next step is to start adding cost centers to get a more
detailed profile, but here is where I hit the end of my ghc profiling
knowledge.  Can anyone help with some general strategies for how to
use the profiling tools in GHC to track this down? Of course once the
culprit has been found, I have to figure out what to *do* about it,
but one thing at a time...

James


More information about the Haskell-Cafe mailing list