[Haskell-cafe] A Monad for on-demand file generation?

Mon Jun 30 19:54:29 EDT 2008

Some comments:

1) unsafeInterleaveIO seems like a big hammer to use for this problem,
and there are a lot of gotchas involved that you may not have fully
thought out.  But you do meet the main criteria (file being read is
assumed to be constant for a single run of the program).

If you have the ability to store metadata about the computation along
with the computation results, maybe that would be a better solution?

2) I agree with Luke that this "smells" more like an applicative
functor.  But getting to monad syntax is quite nice if you can do so.
As an applicative functor you would have "writeFileOD :: Filename ->
ODIO ByteString -> ODIO ()"; then writeFile can handle all the
necessary figuring out of timestamps itself, and you get the bonus
guarantee that the contents of the files read by the "ODIO ByteString"
argument won't affect the filename you are going to output to.

3) Instead of (Read,Show), look into Data.Binary instead, if you
actually care about efficiency.  Parsing text at read time will almost
never be faster than just performing the computation on the source
data again.

  -- ryan

On 6/30/08, Joachim Breitner <mail at joachim-breitner.de> wrote:
> Hi,
>
> for an application such as a image gallery generator, that works on a
> bunch of input files (that are assumed to be constant during one run of
> the program) and generates or updates a bunch of output files, I often
> had the problem of manually tracking what input files a certain output
> file depends on, to check the timestamps if it is necessary to re-create
> the file.
>
> I thought a while how to do this with a monad that does the bookkeeping
> for me. Assuming it's called ODIO (On demand IO), I'd like a piece of
> code like this:
>
> do file1 <- readFileOD "someInput"
>   file2 <- readFileOD "someOtherInput"
>   writeFileOD "someOutput" (someComplexFunction file1 file2)
>
> only actually read "someInput" and "someOtherInput", do the calculation
> and write the output if these have newer time stamps than the output.
>
> The problem I stumbled over was that considering the type of >>=
>  (>>=): Monad m => m a -> (a -> m b) -> m b
> means that I can not „look ahead" what files would be written without
> actually reading the requested file. Of course this is not always
> possible, although I expect this code to be the exception:
> 
> do file1 <- readFileOD "someInput"
>   file2 <- readFileOD "someOtherInput"
>   let filename = decideFileNamenameBasedOn file2
>   writeFileOD filename (someComplexFunction file1 file2)
>
> But assuming that the input does not change during one run of the
> program, it should be safe to use "unsafeInterleaveIO" to only open and
> read the input when used. Then, the readFileOD could put the timestamp
> of the read file in a Monad-local state and the writeFileOD could, if
> the output is newer then all inputs listed in the state, skip the
> writing and thus the unsafeInterleaveIO'ed file reads are skipped as
> well, if they were not required for deciding the flow of the program.
>
> One nice thing is that the implementation of (>>) knows that files read
> in the first action will not affect files written in the second, so in
> contrast to MonadState, we can forget about them, which I hope leads to
> quite good guesses as to what files are relevant for a certain
> writeFileOD operation. Also, a function
>  cacheResultOD :: (Read a, Show a) =>  FilePath -> a -> ODIO a
> can be used to write an (expensive) intermediate result, such as the
> extracted exif information from a file, to disk, so that it can be used
> without actually re-reading the large image file.
>
> Is that a sane idea?
>
> I'm also considering to use this example for a talk about monads at the
> GPN¹ next weekend.