[Haskell-cafe] state updates and efficiency

Tue Dec 5 16:18:20 EST 2006

I am writing some code with complex nested state.  I have a question
about performance with respect to the State monad and the Reader monad.

This is somewhat long, so a quick summary of the question up front:
Can the compiler optimize out state updates that dont change the
state?

The question inline with the code:

> module Test where
> import Control.Monad.State
> import Control.Monad.Reader

I have some nested state.  This is a simplified example where
we have one record inside another.  In my real-world example there
is more nesting and there are lists and maps involved as well.

> data T1 = T1 { f1 :: Int, f2 :: T2 } deriving(Show)
> data T2 = T2 { f3 :: Int, f4 :: Int } deriving(Show)

I want to build generic modifiers and reuse them often.
A good example is modifying a numeric value:

> adjNum :: (Num a) => a -> State a ()
> adjNum n = modify (+ n)

I'm going to be writing state code for my T1 structure which is my
master structure.  If I'm going to be able to reuse adjNum I am
going to have to run a nested state action inside an enclosing
state monad.  I can build a lifter that does this as long as
I know how to extract the nested state and set it back in the
enclosing state:

> withInnerM :: (o -> i) -> (i -> o -> o) -> State i a -> State o a
> withInnerM gettor settor act = do
>     outer <- get
>     let inner = gettor outer
>         (ret, inner') = runState act inner
>         outer' = settor inner' outer
>     put outer'
>     return ret

Now we can make lifters for each of the fields:

> withF1M = withInnerM f1 (\f r -> r {f1=f})
> withF2M = withInnerM f2 (\f r -> r {f2=f})
> withF3M = withInnerM f3 (\f r -> r {f3=f})
> withF4M = withInnerM f4 (\f r -> r {f4=f})

which lets us write some state code for T1 using building blocks like
adjNum.  For example, the following code will add a value to
f1, add another value to f2's f3 and finally return the value of f2's
f4:

> tweakT1 :: Int -> Int -> State T1 Int
> tweakT1 v1 v3 = do
>   withF1M $ adjNum v1
>   withF2M $ withF3M $ adjNum v3
>   withF2M $ withF4M $ get

My question here has to do with efficiency.  In order to update
f2's f3 a new T2 had to be constructed and used to construct a
new T1.  There's no way around this (I assume).  But when doing
a mere get of f4, there's no reason why we should have to build a
new T2 and then a new T1 since nothign changed.  But that's how
the code is written.

We could write a different lifter that does not perform a state
put on the return path.  If we did this with the state monad then
someone could accidentally use that non-modify version of the lifter
and lose an update.  So instead I chose to use the Reader monad and
let the type system enforce the difference between lifters that
modify (M) and those that just read (R).  This involved two kinds
of lifters.

The first lifter runs a Reader action inside of a State monad:

> withRead :: Reader s a -> State s a
> withRead act = do
>     s <- get
>     return $ runReader act s

The second lifter runs a nested Reader action inside of an enclosing
Reader monad.  It is similar in spirit to withInnerM:

> withInnerR :: (o -> i) -> Reader i a -> Reader o a
> withInnerR gettor act = do
>     i <- asks gettor
>     return $ runReader act i

again we can generate lifters for each field:

> withF1R = withInnerR f1
> withF2R = withInnerR f2
> withF3R = withInnerR f3
> withF4R = withInnerR f4

And finally we can use a reader monad to eliminate the extra state
updates from our previous tweakT1 implementation:

> tweakT1' :: Int -> Int -> State T1 Int
> tweakT1' v1 v3 = do
>   withF1M $ adjNum v1
>   withF2M $ withF3M $ adjNum v3
>   withRead $ withF2R $ withF4R $ ask
>
> main = do
>   let x = T1 1 (T2 3 4)
>   print $ runState (tweakT1 5 6) x
>   print $ runState (tweakT1' 5 6) x

My question here is: is it worth it?

If I run

> v1 = T2 1 1
> v2 = v1 {f3 = 1}

is the compiler smart enough to notice that the record update doesn't
result in a change, and avoid constructing an entirely new T2?
If so, then I think the original implementation TweakT1 would be
about as efficient as the more complicated TweakT1'.  Otherwise,
I think the latter would be a lot more efficient when the state is
large and complex.

Tim Newsham
http://www.thenewsh.com/~newsham/