[Haskell-cafe] Efficient Factoring Out of Constants

Sat Jan 17 11:46:12 EST 2009

Hi,

I¹ve been thinking about factoring constants out of iterations and have
spotted another place in my code where I can make use of this.

See the two examples below  the first example iterates over the mcSimulate
function  this has changed a little bit but essentially still recurses
around passing in
two constants, and two variables that are changed each time it is called 
it has the following form:

mcSimulate (startStock, endTime, seedForSeed, runningSum) = ( startStock,
endTime, newSeedForSeed, newRunningSum )

I figured I¹m passing around the constants startStock and endTime so looked
factor these out producing the second example below.

My concern is that although iterate function now only handles two variables,
it¹s still passing around 1 tuple, which under the bonnet is likely to be
represented in machine code as a pointer?  Humor me here a little  I know
I¹m thinking of this in terms of C++, but I¹m guessing the final byte code
will adhere to this:

Thus each time mcSimulate is called  a machine code subroutine will be
passed a memory address to the input data.  Now, the crux of this is, will
it make a COPY of the old input data, BUT with the newSeedForSeed and
newRunningSum, to pass to the next iteration?  If this is the case each
iteration does two useless copies of startStock and endTime?  Here the
second example should produce better code because nothing constant is copied
from 1 iteration to the next.  However if the compiler is smarter and simply
REPLACES the modified data the second example buys us nothing.

However, again, depending very much on the compiler, the second example may
actually be less efficient.  Let¹s say the compiler is super smart and
doesn¹t copy around the startStock and endTime on each iteration in the
first example.  Then we¹ve gained nothing.  However the second example Will
call 3 functions each iteration:

mcWrapper -> mcSimulate -> getStateInfo

In the second example we probably have something like 6 ŒJMP¹ statements in
machine code  3 to jump in to each function, and 3 to jump back out.  In
the first we have 2  one to jump us into mcSimulate and one to return.  So
each iteration executes 4 more JMPs in the second example.  All others
things being equal this will produce slightly less efficient code.

Now I know I¹m speculating like crazy, and perhaps I¹m drunk with efficiency
here, but it would seem to me that whatever way it works there will be a
certain critical mass of constant data that you can carry around that once
breached (i.e. When the copy operations exceed the CPU time taken for the 4
extra JMPs) you will be better off factoring the constant data out..... That
is assuming any of my analysis is correct :-)

If anyone has any insight into how this might looked once compiled down to
machine code, or has an opinion one which example below makes for better
Haskell, I¹d be grateful for any comments, advice or discussion.

Cheers,

Phil.

Note:  I recognize the use of getSum and getStateInfo could be polished
using data structures instead, and the use of !! will not produce strict
evaluation. 
-------------

getSum :: (Double, Double, Word64, Double) -> Double
getSum (startStock, endTime, seedForSeed, runningSum) = runningSum

getAveragePayoff :: Double -> Double -> Int -> Word64 -> Double
getAveragePayoff startStock endTime iterations seedForSeed = average
  where
    average = (getSum $ (iterate mcSimulate (startStock, endTime,
seedForSeed, 0)) !! iterations ) / fromIntegral iterations

---------------

getStateInfo :: (Double, Double, Word64, Double) -> (Word64,Double)
getStateInfo (startStock, endTime, seedForSeed, runningSum) = (seedForSeed,
runningSum)

getAveragePayoff :: Double -> Double -> Int -> Word64 -> Double
getAveragePayoff startStock endTime iterations seedForSeed = average
  where
    average =  (snd $ (iterate mcWrapper (seedForSeed,0)) !! iterations ) /
fromIntegral iterations
      where
        mcWrapper = \(seed,runningSum) -> getStateInfo $ mcSimulate (
startStock, endTime, seed, runningSum )

On 16/01/2009 01:41, "Phil" <pbeadling at mail2web.com> wrote:

> 
> On 16/01/2009 01:28, "Luke Palmer" <lrpalmer at gmail.com> wrote:
> 
>>> Compile-time constants could be handled by simple top-level bindings.  This
>>> technique is specifically for the case you are after:
>>> 
>>> mcSimulate :: Double -> Double -> Word64 -> [Double]
>>> mcSimulate startStock endTime seedForSeed = go seedForSeed
>>>   where
>>>     go = fst expiryStock : go newSeedForSeed
>>>       where
>>>       expiryStock = iterate evolveUnderlying (startStock, ranq1Init
>>> seedForSeed) 
>>>                         !! truncate (endTime/timeStep)
>>>       newSeedForSeed = seedForSeed + 246524
>>>  
>>> See what's going on there?
>>> 
>>> I don't know about that nested where.  In Real Life I would probably use a
>>> let instead for expiryStock and newSeedForSeed.
>>> 
>>> Luke
>>> 
>>> Ahh, I get it now, that¹s pretty neat - Œgo¹ is only updating the
>>> seedForSeed and the expiryStock, the inner Œwhere¹ clause keeps everything
>>> else constant each time it is called.
> 
> Thanks again!
> 
> Phil.
> 
> 
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20090117/0b7a57ea/attachment.htm