[Haskell-cafe] Stacking State on State.....
Daniel Fischer
daniel.is.fischer at web.de
Sun Mar 1 16:46:47 EST 2009
Am Sonntag, 1. März 2009 21:03 schrieb Phil:
> Hi,
>
> Thanks for the replies - I haven't had a chance to try out everything
> suggested yet - but your explanations of transformers nailed it for me.
>
> However, in terms of performance when stacking, I've come across something
> I'm struggling to explain - I was wondering if anyone could offer up and
> explanation.
> I've rewritten my code twice - one with 3 stacked monads, and one with 2
> stacked monads and a load of maps. Heuristically I would have thought the
> 3 stacked monads would have performed as well, or even better than the 2
> stacked solution, but the 2 stacked solution is MUCH faster and MUCH less
> memory is used. They are both using 90% of the same code and both chain
> together the same number of computations using replicateM.
Not quite, the triple stack uses replicateM_, whihc should be a little
cheaper.
> From profiling
> I can see that the pure function 'reflect' takes up most of the umph in
> both cases - which I'd expect. But in the triple stacked version the
> garbage collector is using up >90% of the time.
>
> I've tried using BangPatterns to reduce memory usage in the Triple Stack
> version - doing this I can half the time it takes, but it is still running
> at over twice the time of the two stack version. The BangPatterns were
> also put in Common Code in the reflect function - so I'd expect both
> solutions to need them?
One thing that helps much is to use
import Control.Monad.State.Strict
Using the default lazy State monad, you build enormous thunks in the states,
which harms the triple stack even more than the double stack.
With the strict State monad (and a strict left fold instead of foldr in the
double stack), I get
./v6tripleStrict +RTS -sstderr -K16M
10.450674088955589
444,069,720 bytes allocated in the heap
234,808,472 bytes copied during GC (scavenged)
30,504,688 bytes copied during GC (not scavenged)
41,074,688 bytes maximum residency (9 sample(s))
786 collections in generation 0 ( 21.03s)
9 collections in generation 1 ( 2.54s)
106 Mb total memory in use
INIT time 0.00s ( 0.00s elapsed)
MUT time 4.05s ( 4.21s elapsed)
GC time 23.57s ( 24.18s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 27.62s ( 28.40s elapsed)
%GC time 85.3% (85.2% elapsed)
Alloc rate 109,646,844 bytes per MUT second
Productivity 14.7% of total user, 14.3% of total elapsed
./v6doubleStrict +RTS -sstderr
10.450674088955592
388,795,972 bytes allocated in the heap
177,748,228 bytes copied during GC (scavenged)
23,953,900 bytes copied during GC (not scavenged)
44,560,384 bytes maximum residency (9 sample(s))
710 collections in generation 0 ( 11.62s)
9 collections in generation 1 ( 3.03s)
94 Mb total memory in use
INIT time 0.01s ( 0.00s elapsed)
MUT time 13.54s ( 13.91s elapsed)
GC time 14.65s ( 15.02s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 28.20s ( 28.93s elapsed)
%GC time 52.0% (51.9% elapsed)
Alloc rate 28,693,429 bytes per MUT second
Productivity 48.0% of total user, 46.8% of total elapsed
So, yes, the triple stack uses more memory, but not terribly much more.
However, it spends much more time gc'ing, but as its MUT time is much less,
the total times are not much different.
Now,if we give them enough heap space to begin with:
./v6tripleStrict +RTS -sstderr -K16M -H192M
10.450674088955589
444,077,972 bytes allocated in the heap
95,828,976 bytes copied during GC (scavenged)
15,441,936 bytes copied during GC (not scavenged)
36,147,200 bytes maximum residency (2 sample(s))
5 collections in generation 0 ( 2.16s)
2 collections in generation 1 ( 0.43s)
185 Mb total memory in use
INIT time 0.00s ( 0.00s elapsed)
MUT time 4.20s ( 4.55s elapsed)
GC time 2.59s ( 2.74s elapsed)
EXIT time 0.00s ( 0.95s elapsed)
Total time 6.79s ( 7.29s elapsed)
%GC time 38.1% (37.6% elapsed)
Alloc rate 105,732,850 bytes per MUT second
Productivity 61.9% of total user, 57.6% of total elapsed
./v6doubleStrict +RTS -sstderr -K16M -H192M
10.450674088955592
388,806,408 bytes allocated in the heap
46,446,680 bytes copied during GC (scavenged)
77,852 bytes copied during GC (not scavenged)
159,744 bytes maximum residency (2 sample(s))
4 collections in generation 0 ( 1.36s)
2 collections in generation 1 ( 0.03s)
182 Mb total memory in use
INIT time 0.00s ( 0.00s elapsed)
MUT time 4.53s ( 5.11s elapsed)
GC time 1.39s ( 1.44s elapsed)
EXIT time 0.00s ( 0.02s elapsed)
Total time 5.92s ( 6.55s elapsed)
%GC time 23.5% (21.9% elapsed)
Alloc rate 85,829,229 bytes per MUT second
Productivity 76.5% of total user, 69.2% of total elapsed
MUCH better.
I have no idea why the MUT time for the double stack decreases so much,
though.
>
> Even though both pieces of code are a bit untidy, the triple stacked monad
> 'feels' nicer to me - everything is encapsulated away and one evaluation in
> main yields the result. From purely a design perspective I prefer it - but
> obviously not if it runs like a dog!
>
> Any ideas why the triple stack runs so slow?
It suffers horribly from laziness. One thing is the lazy State monad, another
is the implementation of mc.
>
> Thanks again!
>
> Phil
>
> ***************** Triple Stack Specific Impl:
>
> type MonteCarloStateT = StateT Double
>
> mc :: MonteCarloStateT BoxMullerQuasiState ()
> mc = StateT $ \s -> do nextNormal <- generateNormal
> let stochastic = 0.2*1*nextNormal
> let drift = 0.05 - (0.5*(0.2*0.2))*1
> let newStockSum = payOff 100 ( 100 * exp ( drift +
> stochastic ) ) + s
> return ((),newStockSum)
Don't use a new let on each line, have it all in one let-block.
And, please, force the evaluation of newStockSum:
mc :: MonteCarloStateT BoxMullerQuasiState ()
mc = StateT $ \s -> do
nextNormal <- generateNormal
let stochastic = 0.2*1*nextNormal
drift = 0.05 - (0.5*(0.2*0.2))*1
!newStockSum = payOff 100 ( 100 * exp ( drift + stochastic ) ) + s
return ((),newStockSum)
Now:
./v8tripleStrict +RTS -sstderr
10.450674088955589
396,391,172 bytes allocated in the heap
65,252 bytes copied during GC (scavenged)
22,272 bytes copied during GC (not scavenged)
40,960 bytes maximum residency (1 sample(s))
757 collections in generation 0 ( 0.02s)
1 collections in generation 1 ( 0.00s)
1 Mb total memory in use
INIT time 0.00s ( 0.00s elapsed)
MUT time 3.36s ( 3.49s elapsed)
GC time 0.02s ( 0.05s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 3.38s ( 3.54s elapsed)
%GC time 0.6% (1.3% elapsed)
Alloc rate 117,973,563 bytes per MUT second
Productivity 99.4% of total user, 94.9% of total elapsed
w00t!
>
> iterations = 1000000
> main :: IO()
> main = do let sumOfPayOffs = evalState ( evalStateT ( execStateT (do
> replicateM_ iterations mc) $ 0 ) $ (Nothing,nextHalton) ) $ (1,[3,5])
> let averagePO = sumOfPayOffs / fromIntegral iterations
> let discountPO = averagePO * exp (-0.05)
> print discountPO
>
Again, don't needlessly multiply the lets.
>
> ***************** Double Stack and Map Specific Impl:
>
>
> iterations = 1000000
> main :: IO()
> main = do let normals = evalState ( evalStateT (do replicateM iterations
> generateNormal) $ (Nothing,nextHalton) ) $ (1,[3,5])
> let stochastic = map (0.2*1*) normals
> let sde = map ((( 0.05 - (0.5*(0.2*0.2)) )*1)+) stochastic
> let expiryMult = map exp sde
> let expiry = map (100*) expiryMult
> let payoff = map (payOff 100) expiry
> let averagePO = (foldr (+) 0 payoff) / fromIntegral iterations
> let discountPO = averagePO * exp (-0.05)
> print discountPO
>
Same here, but important for performance is to replace the foldr with foldl'.
>
> ***************** Common Code Used By Both Methods:
>
>
> import Control.Monad.State
> import Debug.Trace
>
> -- State Monad for QRNGs - stores current iteration and list of
> -- bases to compute
> type QuasiRandomState = State (Int,[Int])
>
> nextHalton :: QuasiRandomState [Double]
> nextHalton = do (n,bases) <- get
> let !nextN = n+1
> put (nextN,bases)
> return $ map (reflect (n,1,0)) bases
>
> type ReflectionThreadState = (Int,Double,Double)
>
> reflect :: ReflectionThreadState -> Int -> Double
> reflect (k,f,h) base
>
> | k <= 0 = h
> | otherwise = reflect (newK,newF,newH) base
>
> where
> newK = k `div` base
> newF = f / fromIntegral base
> newH = h + fromIntegral(k `mod` base) * newF
>
> -- So we are defining a state transform which has state of 'maybe double'
> and an
> -- operating function for the inner monad of type QuasiRandomMonad
> returning a [Double]
> -- We then say that it wraps an QuasiRandomMonad (State Monad) - it must of
> course
> -- if we pass it a function that operates on these Monads we must wrap the
> same
> -- type of Monad. And finally it returns a double
>
> type BoxMullerStateT = StateT (Maybe Double, QuasiRandomState [Double])
> type BoxMullerQuasiState = BoxMullerStateT QuasiRandomState
>
> generateNormal :: BoxMullerQuasiState Double
> generateNormal = StateT $ \s -> case s of
> (Just d,qrnFunc) -> return (d,(Nothing,qrnFunc))
> (Nothing,qrnFunc) -> do qrnBaseList <- qrnFunc
> let (norm1,norm2) = boxMuller (head
> qrnBaseList) (head $ tail qrnBaseList)
> return (norm1,(Just norm2,qrnFunc))
>
> boxMuller :: Double -> Double -> (Double,Double)
> -- boxMuller rn1 rn2 | trace ( "rn1 " ++ show rn1 ++ " rn2 " ++ show rn2 )
> False=undefined
> boxMuller rn1 rn2 = (normal1,normal2)
> where
> r = sqrt ( (-2)*log rn1)
> twoPiRn2 = 2 * pi * rn2
> normal1 = r * cos ( twoPiRn2 )
> normal2 = r * sin ( twoPiRn2 )
>
>
>
> payOff :: Double -> Double -> Double
> payOff strike stock | (stock - strike) > 0 = stock - strike
>
> | otherwise = 0
>
Cheers,
Daniel
More information about the Haskell-Cafe
mailing list