[Haskell-cafe] Interesting data structure

Thu Dec 27 19:06:13 EST 2007

I'm using a control structure that's a variation of a monad and I'm
interested in whether

    - it's got a name
    - it deserves a name (!)
    - anything else similar is used elsewhere

Please excuse the longer post...

I have two programs that need to interact with the outside world, and
I want to constrain the nature of these interactions. I don't want to
just sprinkle IO throughout the code.

In the first program, I am reading on-demand from a database - just
reading, not making any changes.

In the second, I am requesting computations to be evaluated
externally, in order to take advantage of a grid of machines.

In both of these cases, the external requests don't change the state
of the world, and the programs can be consided "pure" as long as
the world isn't being changed by some other means. Hence the requests
can be reordered, performed in parallel, and optimised in various ways
without affecting the result.

To make this concrete, if an external request has a type like:

    a -> m b

where a is the input to the request, b is the response, and m is some
monad, probably IO. Then a data structure capturing calculations over
these requests, with result type c can be:

    data SCalc a b c = SCResult c
                     | SCStage {
                             sc_req :: a,
                             sc_calc :: b -> SCalc a b c
                       }

The idea is that we either have a result, or we need to make a
external request, whose response is used to generate a new
calculation.

"Running" such a calculation is straightforward:

    runSC :: (Monad m ) => SCalc a b c -> (a -> m b) -> m c
    runSC (SCResult v) _ = return v
    runSC (SCStage a cf) reqf = do
        b <- reqf a
        runSC (cf b) reqf

and calculations can be sequence by making a monad instance:

    instance Monad (SCalc a b) where
       return = SCResult

       (>>=) (SCResult v)    cf = cf v
       (>>=) (SCStage a cf1) cf = SCStage a (\b -> cf1 b >>= cf)

Where it gets interesting, however, is that when they don't depend on
each other, we can run these calculations in parallel. We need the
ability to merge requests, and split responses. Hence,

    class Req a where
       merge :: a -> a -> a

    class Resp b where
       split :: b -> (b,b)

    par :: (Req a, Resp b) => SCalc a b c -> SCalc a b d -> SCalc a b (c,d)

The par primitive above can be used to define other parallel
operations, such as

    parList :: (Req a, Resp b) => [SCalc a b c] -> SCalc a b [c]

I found it worthwhile to try and visualise what's going on here. Let's
say I have 4 calculations that I want to run in parallel. The first
doesn't need a request; the second needs to make a single request
(A1); the third needs to make two requests where the second (B2)
depends on the result of the first (B1), etc. The resulting parallel
operations will be done in 3 batches, looking like:

         batch1  batch2   batch3   result

calc1    ------------------------> V0
calc2    A1      ----------------> V1
calc3    B1      B2       -------> V2
calc4    C1      C2       C3   --> V3

(excuse the ascii art). batch1 will consist of A1,B1,C1 merged
together; batch2 of B2,C3 merged; etc.

In practice, I've used the above data types to abstract out database
access in some reporting code. It works quite well, as use of the
parallel primitive above means that the haskell code talking to the
database sees all of the information in each batch simultaneously, so
it can optimise the queries, remove redundant requests etc. It also
makes the reporting code pure despite the fact that information is
being loaded on demand from the db (without any unsafe calls behind
the scene). I guess the use of the term "pure" here should be
qualified: the impure code has been factored out to a single function
in a different module that has a limited and well defined interface.

I haven't implemented the grid calculation example described above,
though I see that it ought to be able to work similarly, potentially
removing duplicate calculation requests, etc.

So my questions are: does this sort of "monad allowing parallel
evaluation" structure have a name? Is it an existing design pattern
in fp somewhere that I haven't seen?

thanks,

Tim