[Haskell-cafe] A Monad for on-demand file generation?

Mon Jun 30 08:08:31 EDT 2008

On Mon, 2008-06-30 at 12:04 +0200, Joachim Breitner wrote:
> Hi,
> 
> for an application such as a image gallery generator, that works on a
> bunch of input files (that are assumed to be constant during one run of
> the program) and generates or updates a bunch of output files, I often
> had the problem of manually tracking what input files a certain output
> file depends on, to check the timestamps if it is necessary to re-create
> the file.
> 
> I thought a while how to do this with a monad that does the bookkeeping
> for me. Assuming it’s called ODIO (On demand IO), I’d like a piece of
> code like this:
> 
> do file1 <- readFileOD "someInput"
>    file2 <- readFileOD "someOtherInput"
>    writeFileOD "someOutput" (someComplexFunction file1 file2)
> 
> only actually read "someInput" and "someOtherInput", do the calculation
> and write the output if these have newer time stamps than the output.
> 
> The problem I stumbled over was that considering the type of >>=
>  (>>=): Monad m => m a -> (a -> m b) -> m b
> means that I can not „look ahead“ what files would be written without
> actually reading the requested file. Of course this is not always
> possible, although I expect this code to be the exception:
> 
> do file1 <- readFileOD "someInput"
>    file2 <- readFileOD "someOtherInput"
>    let filename = decideFileNamenameBasedOn file2
>    writeFileOD filename (someComplexFunction file1 file2)
> 
> But assuming that the input does not change during one run of the
> program, it should be safe to use "unsafeInterleaveIO" to only open and
> read the input when used. Then, the readFileOD could put the timestamp
> of the read file in a Monad-local state and the writeFileOD could, if
> the output is newer then all inputs listed in the state, skip the
> writing and thus the unsafeInterleaveIO’ed file reads are skipped as
> well, if they were not required for deciding the flow of the program.
> 
> One nice thing is that the implementation of (>>) knows that files read
> in the first action will not affect files written in the second, so in
> contrast to MonadState, we can forget about them, which I hope leads to
> quite good guesses as to what files are relevant for a certain
> writeFileOD operation. Also, a function
>   cacheResultOD :: (Read a, Show a) =>  FilePath -> a -> ODIO a
> can be used to write an (expensive) intermediate result, such as the
> extracted exif information from a file, to disk, so that it can be used
> without actually re-reading the large image file.
> 
> Is that a sane idea?
> 
> I’m also considering to use this example for a talk about monads at the
> GPN¹ next weekend.

You may want to look at Magnus Carlsson's "Monads for Incremental
Computing" http://citeseer.comp.nus.edu.sg/619122.html