[Haskell-cafe] state updates and efficiency
Chris Kuklewicz
haskell at list.mightyreason.com
Wed Dec 6 07:07:58 EST 2006
I'll make a random comment...
Tim Newsham wrote:
> I am writing some code with complex nested state. I have a question
> about performance with respect to the State monad and the Reader monad.
A side comment is this code:
> instance MonadReader s (State s) where
> ask = get
> local f m = fmap (fst . runState m . f) get
or
> local f m = do s <- get
> put (f s)
> a <- m
> put s
> return a
With an instance like this you can mix in generic MonadReader code. Feel free to
replace State with your own instance of MonadState.
> This is somewhat long, so a quick summary of the question up front:
> Can the compiler optimize out state updates that dont change the
> state?
I am not the compiler expert, but I think this is unlikely. For GHC you should
compile with optimizations and dump the Core code to see if it does what you
want. (-ddump-simp option I think).
> The question inline with the code:
>
>> module Test where
>> import Control.Monad.State
>> import Control.Monad.Reader
>
> I have some nested state. This is a simplified example where
> we have one record inside another. In my real-world example there
> is more nesting and there are lists and maps involved as well.
>
>> data T1 = T1 { f1 :: Int, f2 :: T2 } deriving(Show)
>> data T2 = T2 { f3 :: Int, f4 :: Int } deriving(Show)
>
> I want to build generic modifiers and reuse them often.
> A good example is modifying a numeric value:
>
>> adjNum :: (Num a) => a -> State a ()
>> adjNum n = modify (+ n)
This is not much shorthand. But by way of example it is fine.
>
> I'm going to be writing state code for my T1 structure which is my
> master structure. If I'm going to be able to reuse adjNum I am
^^^^^^^^^^^^^
This is a slightly odd engineering goal.
> going to have to run a nested state action inside an enclosing
^^^^^^^^^^^^^^^^^^^^^^^^^
You want performance but this pushes more work to the compiler. If this were
all functional instead of Monadic it might be simpler to start with.
> state monad. I can build a lifter that does this as long as
> I know how to extract the nested state and set it back in the
> enclosing state:
>
>> withInnerM :: (o -> i) -> (i -> o -> o) -> State i a -> State o a
>> withInnerM gettor settor act = do
>> outer <- get
>> let inner = gettor outer
>> (ret, inner') = runState act inner
>> outer' = settor inner' outer
>> put outer'
>> return ret
>
> Now we can make lifters for each of the fields:
>
>> withF1M = withInnerM f1 (\f r -> r {f1=f})
>> withF2M = withInnerM f2 (\f r -> r {f2=f})
>> withF3M = withInnerM f3 (\f r -> r {f3=f})
>> withF4M = withInnerM f4 (\f r -> r {f4=f})
Main *main comment* is to separate manipulating the complex data structure from
the State commands. And to make it more abstract:
get1,get3,get4 :: T1 -> Int
get2 :: T1 -> T2
get1 = f1
get2 = f2
get3 = f3 . get2
get4 = f4 . get2
put1,put3,put4 :: Int -> T1 -> T1
put2 :: T2 -> T1 -> T1
put1 x o = o {f1=x}
put2 x o = o {f2=x}
put3 x o = mod2 (\o2 -> o2 {f3=x}) o
put4 x o = mod2 (\o2 -> o2 {f4=x}) o
mod1,mod3,mod4 :: (Int->Int) -> T1 -> T1
mod2 :: (T2->T2) -> T1 -> T1
mod1 f o = put1 (f (get1 o)) o
mod2 f o = put2 (f (get2 o)) o
mod3 f o = put3 (f (get3 o)) o
mod4 f o = put4 (f (get4 o)) o
And note the different but important design choice:
You don't need to know how the data is nested to access a field.
If you insert a T1'and'a'half data field "between" T1 and T2 then you just need
to update get2/put2 to fix (get|put|mod)(3|4) and this also fixes all the code
that uses any of these functions.
> which lets us write some state code for T1 using building blocks like
> adjNum. For example, the following code will add a value to
> f1, add another value to f2's f3 and finally return the value of f2's
> f4:
>
>> tweakT1 :: Int -> Int -> State T1 Int
>> tweakT1 v1 v3 = do
>> withF1M $ adjNum v1
>> withF2M $ withF3M $ adjNum v3
>> withF2M $ withF4M $ get
The above would break if you added T1'and'a'half since it needs to know the
structure of the data. This is why withF3M is not a good abstraction.
Now for my version of tweakT1:
-- My choice is to use a strict modify that also returns the new value
modify' :: (MonadState a m) => (a -> a) -> m a
modify' f = do x <- liftM f get
put $! x
return x
-- Now tweakT1 can be a one-liner or longer
tweakT1,tweakT1'long :: (MonadState T1 f) => Int -> Int -> f Int
tweakT1 v1 v3 = liftM get4 (modify' (mod1 (+ v1) . mod3 (+ v3)))
tweakT1'long v1 v3 = do
modify (mod1 (+ v1))
modify (mod3 (+ v3))
liftM get4 get
If you want something like adjNum, how about
adjNum' :: (Num a) => a -> ((a->a) -> s->s) -> State s a
adjNum' x mod = modify' (mod (+ x))
tweakT1'adj v1 v3 = do
adjNum' v1 mod1
adjNum' v2 mod3
liftM get4 get
The compiler may or may not optimize the mod1 and mod3 into one T1 construction
instead of two. If you modify parts 1 and 3 of the state in tandem a lot then
mod13 g1 g3 o@(T1 {f1=v1,f2=v2@(T2 {f3=v3})}) = o {f1=g1 v1,f2=v2 {f3=g3 v3}}
will certainly avoid reconstructing T1 twice.
> My question here has to do with efficiency. In order to update
> f2's f3 a new T2 had to be constructed and used to construct a
> new T1. There's no way around this (I assume).
True. The number of T1's and T2's constructed is the issue. Reading the
-dump-simpl Core text is the best way to check. Using {-# INLINE ... #-} could
help.
> But when doing
> a mere get of f4, there's no reason why we should have to build a
> new T2 and then a new T1 since nothign changed. But that's how
> the code is written.
The "run a nested state action" decision combined with always building
on withInnerM with calls put is the culprit.
> We could write a different lifter that does not perform a state
> put on the return path. If we did this with the state monad then
> someone could accidentally use that non-modify version of the lifter
> and lose an update. So instead I chose to use the Reader monad and
> let the type system enforce the difference between lifters that
> modify (M) and those that just read (R). This involved two kinds
> of lifters.
>
> The first lifter runs a Reader action inside of a State monad:
>
>> withRead :: Reader s a -> State s a
>> withRead act = do
>> s <- get
>> return $ runReader act s
That type signature is
withReader :: Reader T1 a -> State T1 a
withReader act = liftM (runReader act) ask
-- or
withReader act = liftM (runReader act) get
but this is not directly helpful. Running nested monads is not the best course.
>
> The second lifter runs a nested Reader action inside of an enclosing
> Reader monad. It is similar in spirit to withInnerM:
>
>> withInnerR :: (o -> i) -> Reader i a -> Reader o a
>> withInnerR gettor act = do
>> i <- asks gettor
>> return $ runReader act i
>
> again we can generate lifters for each field:
>
>> withF1R = withInnerR f1
>> withF2R = withInnerR f2
>> withF3R = withInnerR f3
>> withF4R = withInnerR f4
>
> And finally we can use a reader monad to eliminate the extra state
> updates from our previous tweakT1 implementation:
>
>> tweakT1' :: Int -> Int -> State T1 Int
>> tweakT1' v1 v3 = do
>> withF1M $ adjNum v1
>> withF2M $ withF3M $ adjNum v3
>> withRead $ withF2R $ withF4R $ ask
And that was a very roundabout way to derive
liftM get4 == withRead . withF2R . withF4R
>> main = do
>> let x = T1 1 (T2 3 4)
>> print $ runState (tweakT1 5 6) x
>> print $ runState (tweakT1' 5 6) x
>
> My question here is: is it worth it?
>
> If I run
>
>> v1 = T2 1 1
>> v2 = v1 {f3 = 1}
>
> is the compiler smart enough to notice that the record update doesn't
> result in a change, and avoid constructing an entirely new T2?
> If so, then I think the original implementation TweakT1 would be
> about as efficient as the more complicated TweakT1'. Otherwise,
> I think the latter would be a lot more efficient when the state is
> large and complex.
>
> Tim Newsham
> http://www.thenewsh.com/~newsham/
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
More information about the Haskell-Cafe
mailing list