[Haskell-cafe] using mutable data structures in pure functions

Wed Mar 14 05:36:42 CET 2012

On Sun, Mar 11, 2012 at 8:38 PM, E R <pc88mxer at gmail.com> wrote:

> A pure function can allocate and modify memory as long as a) it never
> returns a reference to the memory or b) it never again modifies the
> memory once it returns (i.e. it returns an immutable object).
>

That's a reasonable first approximation to the problem, yes.  It gets a bit
more complicated due to laziness (what if the mutation gets delayed until
some later part of the output is examined?)

So, again, what is the Haskell philosophy towards using mutable data
> structures in pure functions? Is it:
>
> 1. leave it to the compiler to find these kinds of opportunities
> 2. just use the immutable data structures - after all they are just as
> good (at least asymptotically)
> 3. you don't want to use mutable data structures because of _____
> 4. it does happen, and some examples are ______
>

There are two ways in which Haskell encourages the use of mutable data
structures in a pure way.

The first is in the inherent mutation caused by laziness.  For example:

type Positive = Integer

-- trie of binary representation of positive number
-- [1] -> tOne
-- x ++ [0] -> lookup x . tEven
-- x ++ [1] -> lookup x . tOdd
data Trie a = Trie {
    tOne :: a,
    tEven :: NatTrie a,
    tOdd :: NatTrie a
    }

lookupTrie :: Trie a -> Positive -> a
lookupTrie t 1 = tOne t
lookupTrie t n
    | even n = lookupTrie (tEven t) (n `div` 2)
    | otherwise = lookupTrie (tOdd t) (n `div` 2) -- div drops remainder

makeTrie :: (Positive -> a) -> Trie a
makeTrie f = Trie (f 1) e o where
    e = makeTrie $ \n -> f (2*n)
    o = makeTrie $ \n -> f (2*n + 1)

memoize :: (Positive -> a) -> (Positive -> a)
memoize = lookupTrie . makeTrie

collatz_rec :: (Positive -> Integer) -> Positive -> Integer
collatz_rec f 1 = 0
collatz_rec f n
    | even n = 1 + f (n `div` 2)
    | otherwise = 1 + f (3*n + 1)

collatz = memoize (collatz_rec collatz)

In this case, makeTrie creates a thunk, and it's only evaluated where
requested by lookupTrie.  You can call collatz at many different values and
later calls will be much faster, as the mutation caused by lazy evaluation
'remembers' the values.

The second is by explicitly documenting that you are using a temporarily
mutable structure, which is the ST monad:

instance Monad (ST s)
newSTRef :: a -> ST s (STRef s a)
readSTRef :: STRef s a -> ST s a
writeSTRef :: STRef s a -> a -> ST s ()
-- and similar interface for mutable STArrays

runST :: (forall s. ST s a) -> a -- note higher rank type

A computation in the ST monad is an impure computation that can modify
memory, but only memory allocated within that same computation.

The higher rank type in runST makes it safe to do so--references from one
ST computation cannot escape to any other ST computation.  So even though
internally some pure value might rely on an impure computation, it's safe
to do so in a pure context.

Here's a sample implementation of ST:

-- DO NOT EXPORT THESE CONSTRUCTORS
newtype ST s a = ST (IO a)
newtype STRef s a = STRef { getRef :: IORef a }

runST :: (forall s. ST s a) -> a
runST (ST act) = unsafePerformIO act -- Actually safe!

newSTRef :: a -> ST s (STRef s a)
newSTRef a  = ST $ liftM STRef (newIORef a)

readSTRef :: STRef s a -> ST s a
readSTRef (STRef r) = ST $ readIORef r

writeSTRef :: STRef s a -> a -> ST s ()
writeSTRef (STRef r) a = ST $ writeIORef r a

There's some usage examples at http://www.haskell.org/haskellwiki/Monad/ST

  -- ryan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20120313/cd4e8bbf/attachment.htm>