[Haskell-cafe] state updates and efficiency

Chris Kuklewicz haskell at list.mightyreason.com
Wed Dec 6 07:07:58 EST 2006


I'll make a random comment...

Tim Newsham wrote:
> I am writing some code with complex nested state.  I have a question
> about performance with respect to the State monad and the Reader monad.

A side comment is this code:

> instance MonadReader s (State s) where
>   ask = get

>   local f m = fmap (fst . runState m  . f) get

or

>   local f m = do s <- get
>                  put (f s)
>                  a <- m
>                  put s
>                  return a

With an instance like this you can mix in generic MonadReader code. Feel free to
replace State with your own instance of MonadState.

> This is somewhat long, so a quick summary of the question up front:
> Can the compiler optimize out state updates that dont change the
> state?

I am not the compiler expert, but I think this is unlikely.  For GHC you should
compile with optimizations and dump the Core code to see if it does what you
want.  (-ddump-simp option I think).

> The question inline with the code:
> 
>> module Test where
>> import Control.Monad.State
>> import Control.Monad.Reader
> 
> I have some nested state.  This is a simplified example where
> we have one record inside another.  In my real-world example there
> is more nesting and there are lists and maps involved as well.
> 
>> data T1 = T1 { f1 :: Int, f2 :: T2 } deriving(Show)
>> data T2 = T2 { f3 :: Int, f4 :: Int } deriving(Show)
> 
> I want to build generic modifiers and reuse them often.
> A good example is modifying a numeric value:
> 
>> adjNum :: (Num a) => a -> State a ()
>> adjNum n = modify (+ n)

This is not much shorthand.  But by way of example it is fine.

> 
> I'm going to be writing state code for my T1 structure which is my
> master structure.  If I'm going to be able to reuse adjNum I am
                                                ^^^^^^^^^^^^^
This is a slightly odd engineering goal.
> going to have to run a nested state action inside an enclosing
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
You want performance but this pushes more work to the compiler.  If this were
all functional instead of Monadic it might be simpler to start with.
> state monad.  I can build a lifter that does this as long as
> I know how to extract the nested state and set it back in the
> enclosing state:
> 
>> withInnerM :: (o -> i) -> (i -> o -> o) -> State i a -> State o a
>> withInnerM gettor settor act = do
>>     outer <- get
>>     let inner = gettor outer
>>         (ret, inner') = runState act inner
>>         outer' = settor inner' outer
>>     put outer'
>>     return ret
> 
> Now we can make lifters for each of the fields:
> 
>> withF1M = withInnerM f1 (\f r -> r {f1=f})
>> withF2M = withInnerM f2 (\f r -> r {f2=f})
>> withF3M = withInnerM f3 (\f r -> r {f3=f})
>> withF4M = withInnerM f4 (\f r -> r {f4=f})

Main *main comment* is to separate manipulating the complex data structure from
the State commands.  And to make it more abstract:

get1,get3,get4 :: T1 -> Int
get2 :: T1 -> T2
get1 = f1
get2 = f2
get3 = f3 . get2
get4 = f4 . get2

put1,put3,put4 :: Int -> T1 -> T1
put2 :: T2 -> T1 -> T1
put1 x o = o {f1=x}
put2 x o = o {f2=x}
put3 x o = mod2 (\o2 -> o2 {f3=x}) o
put4 x o = mod2 (\o2 -> o2 {f4=x}) o

mod1,mod3,mod4 :: (Int->Int) -> T1 -> T1
mod2 :: (T2->T2) -> T1 -> T1
mod1 f o = put1 (f (get1 o)) o
mod2 f o = put2 (f (get2 o)) o
mod3 f o = put3 (f (get3 o)) o
mod4 f o = put4 (f (get4 o)) o

And note the different but important design choice:
  You don't need to know how the data is nested to access a field.

If you insert a T1'and'a'half data field "between" T1 and T2 then you just need
to update get2/put2 to fix (get|put|mod)(3|4) and this also fixes all the code
that uses any of these functions.

> which lets us write some state code for T1 using building blocks like
> adjNum.  For example, the following code will add a value to
> f1, add another value to f2's f3 and finally return the value of f2's
> f4:
> 
>> tweakT1 :: Int -> Int -> State T1 Int
>> tweakT1 v1 v3 = do
>>   withF1M $ adjNum v1
>>   withF2M $ withF3M $ adjNum v3
>>   withF2M $ withF4M $ get

The above would break if you added T1'and'a'half since it needs to know the
structure of the data.  This is why withF3M is not a good abstraction.

Now for my version of tweakT1:

-- My choice is to use a strict modify that also returns the new value
modify' :: (MonadState a m) => (a -> a) -> m a
modify' f = do x <- liftM f get
               put $! x
               return x

-- Now tweakT1 can be a one-liner or longer
tweakT1,tweakT1'long :: (MonadState T1 f) => Int -> Int -> f Int
tweakT1 v1 v3 = liftM get4 (modify' (mod1 (+ v1) . mod3 (+ v3)))

tweakT1'long v1 v3 = do
  modify (mod1 (+ v1))
  modify (mod3 (+ v3))
  liftM get4 get

If you want something like adjNum, how about

adjNum' :: (Num a) => a -> ((a->a) -> s->s) -> State s a
adjNum' x mod = modify' (mod (+ x))

tweakT1'adj v1 v3 = do
  adjNum' v1 mod1
  adjNum' v2 mod3
  liftM get4 get

The compiler may or may not optimize the mod1 and mod3 into one T1 construction
instead of two. If you modify parts 1 and 3 of the state in tandem a lot then

mod13 g1 g3 o@(T1 {f1=v1,f2=v2@(T2 {f3=v3})}) = o {f1=g1 v1,f2=v2 {f3=g3 v3}}

will certainly avoid reconstructing T1 twice.

> My question here has to do with efficiency.  In order to update
> f2's f3 a new T2 had to be constructed and used to construct a
> new T1.  There's no way around this (I assume).

True.  The number of T1's and T2's constructed is the issue.  Reading the
-dump-simpl Core text is the best way to check.  Using {-# INLINE ... #-} could
help.

> But when doing
> a mere get of f4, there's no reason why we should have to build a
> new T2 and then a new T1 since nothign changed.  But that's how
> the code is written.

The "run a nested state action" decision combined with always building
on withInnerM with calls put is the culprit.

> We could write a different lifter that does not perform a state
> put on the return path.  If we did this with the state monad then
> someone could accidentally use that non-modify version of the lifter
> and lose an update.  So instead I chose to use the Reader monad and
> let the type system enforce the difference between lifters that
> modify (M) and those that just read (R).  This involved two kinds
> of lifters.
> 
> The first lifter runs a Reader action inside of a State monad:
> 
>> withRead :: Reader s a -> State s a
>> withRead act = do
>>     s <- get
>>     return $ runReader act s

That type signature is

withReader :: Reader T1 a -> State T1 a
withReader act = liftM (runReader act) ask
-- or
withReader act = liftM (runReader act) get

but this is not directly helpful.  Running nested monads is not the best course.

> 
> The second lifter runs a nested Reader action inside of an enclosing
> Reader monad.  It is similar in spirit to withInnerM:
> 
>> withInnerR :: (o -> i) -> Reader i a -> Reader o a
>> withInnerR gettor act = do
>>     i <- asks gettor
>>     return $ runReader act i
> 
> again we can generate lifters for each field:
> 
>> withF1R = withInnerR f1
>> withF2R = withInnerR f2
>> withF3R = withInnerR f3
>> withF4R = withInnerR f4
> 
> And finally we can use a reader monad to eliminate the extra state
> updates from our previous tweakT1 implementation:
> 
>> tweakT1' :: Int -> Int -> State T1 Int
>> tweakT1' v1 v3 = do
>>   withF1M $ adjNum v1
>>   withF2M $ withF3M $ adjNum v3
>>   withRead $ withF2R $ withF4R $ ask

And that was a very roundabout way to derive

liftM get4 == withRead . withF2R . withF4R

>> main = do
>>   let x = T1 1 (T2 3 4)
>>   print $ runState (tweakT1 5 6) x
>>   print $ runState (tweakT1' 5 6) x
> 
> My question here is: is it worth it?
> 
> If I run
> 
>> v1 = T2 1 1
>> v2 = v1 {f3 = 1}
> 
> is the compiler smart enough to notice that the record update doesn't
> result in a change, and avoid constructing an entirely new T2?
> If so, then I think the original implementation TweakT1 would be
> about as efficient as the more complicated TweakT1'.  Otherwise,
> I think the latter would be a lot more efficient when the state is
> large and complex.
> 
> Tim Newsham
> http://www.thenewsh.com/~newsham/
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe



More information about the Haskell-Cafe mailing list