[Haskell-beginners] What is the functional way of implementing a function that takes a long time to execute?

Wed Apr 24 00:55:48 CEST 2013

Hello Ertugrul,

Thank you for your detailed explanation. I am not sure that I understand all of its subtleties.

I am wondering if you would be willing to provide us with a simple, complete example that we can try out on our own machines? 

/Roger

-----Original Message-----
From: beginners-bounces at haskell.org [mailto:beginners-bounces at haskell.org] On Behalf Of Ertugrul Söylemez
Sent: Tuesday, April 23, 2013 9:29 AM
To: beginners at haskell.org
Subject: Re: [Haskell-beginners] What is the functional way of implementing a function that takes a long time to execute?

"Costello, Roger L." <costello at mitre.org> wrote:

> Suppose a function takes a long time to do its work.
>
> Perhaps it takes minutes or even hours to complete.
>
> While it is crunching along, it would be nice to have some insight
> into its status such as (1) how close is it to completing? (2) what
> part of the task is it currently working on?
>
> It might even be nice to be notified when it is finished.
>
> What is the functional way of implementing the function?

The traditional way in Haskell to see intermediate results is not to
produce the final result only, but to produce a list of intermediate
results, the last of which is the final result.  The trick is to apply
what we call corecursion, which basically means:  Wrap the recursion in
an (ideally nonstrict) data constructor cell.  The function

    sqrtApprox :: Rational -> Rational

then becomes:

    sqrtApprox :: Rational -> [Rational]

You know that you have done it properly if you used proper corecursion,
which looks similar to this:

    loop x y = (x, y) : loop x' y'
        where
        x' = {- ... -}
        y' = {- ... -}

The important part is that the recursion is the right argument of the
constructor (:) and that the constructor application is the last thing
that happens.  You can make sure that you have done it properly and even
get some nice deforestation optimizations by using one of the predefined
corecursion operators like 'unfoldr', 'iterate', etc.

Sometimes you'll want to encode an algorithm even as a composition of
predefined corecursive formulas like 'map', 'filter', 'tails', etc.  For
example you may have a list [1,2,3,4,5,6,7,8,9] to begin with and you
want to encode the sum of the products of three consecutive values,
1*2*3 + 4*5*6 + 7*8*9 + 10:

    takeWhile (not . null) . map (take 3) . iterate (drop 3)

Now how do we produce online statistics while this algorithm calculates
its result?  This is the easiest part, but there is a catch.  You have a
corecursively produced list, which ensures that the list is lazy enough.
All you have to do now is to consume the list as part of an IO action.
The foldM combinator is most helpful here:

    sumStats :: (Num a) => [a] -> IO a
    sumStats = foldM f 0
        where
        f s x = do
            putStrLn ("Sum so far: " ++ s)
            return (s + x)

As said there is a catch.  In your recursive consumer you actually have
to make sure that the intermediate result is actually calculated.  This
is important, because otherwise your consumer doesn't actually force the
calculation, but really just builds up a large unevaluated expression,
which is only evaluated at the very end.

The easiest way to ensure this is to just do what I did in sumStats:
Print the intermediate result (you may call it 'state') or some value
derived from it (make sure that the value depends on the entire state).
If you don't want to perform some IO action with the state you can also
just be strict.  There are many ways to be strict, my favorite being

    f s x = do
        {- ... -}
        return $! s + x

but you can also use the BangPatterns extension.

The basic idea of all this is that you turn an opaque monolithic formula
into a stream processing formula.  This not only makes it more flexible,
but may also help you understand the original problem better and split
it into independent modules.  Remember that you can always compose
stream processors using regular function composition as done above.

Once you are comfortable with using lists for stream processing you can
also use an actual stream processing abstraction.  My personal favorite
right now is the 'pipes' library, but there are other useful libraries
including the other modern library 'conduit' as well as the traditional
'enumerator' library.

This of course does not end with streams.  Streams are for algorithms
with a linear execution paths, but not every algorithm follows that.  If
your formula naturally follows multiple paths, there is nothing wrong
with corecursively producing and recursively consuming trees or graphs,
for example.  You just need unfoldTree+foldTreeM or
unfoldGraph+foldGraphM for that.  This works with every algebraic data
structure.

As a final remark, don't worry about the performance.  Haskell is a lazy
language.  Done properly at least the intermediate lists will be
optimized away by the compiler and the resulting machine code is close
to what a C compiler would have produced for the original monolithic
formula randomly interspersed with statistics printing commands.

Greets
Ertugrul

-- 
Not to be or to be and (not to be or to be and (not to be or to be and
(not to be or to be and ... that is the list monad.