[Haskell-cafe] A Monad for on-demand file generation?

Tue Jul 1 17:52:02 EDT 2008

On 7/1/08, Joachim Breitner <mail at joachim-breitner.de> wrote:
> Hi,
>
> thanks for your comments.
>
> Am Montag, den 30.06.2008, 16:54 -0700 schrieb Ryan Ingram:
> > 1) unsafeInterleaveIO seems like a big hammer to use for this problem,
> > and there are a lot of gotchas involved that you may not have fully
> > thought out.  But you do meet the main criteria (file being read is
> > assumed to be constant for a single run of the program).
>
> Any other gotcha? Anyways, is this really worse than the similary lazy
> readFile? Using that would not safe the call to open, but at least the
> reading and processing, in the same situations.

Well, you're also (from your description) probably writing some
tracking information to an IORef of some sort.  That can happen in the
middle of an otherwise pure computation, and it's difficult to know
exactly when it'll get triggered, due to laziness.  You can probably
make it work :)

> > If you have the ability to store metadata about the computation along
> > with the computation results, maybe that would be a better solution?
>
> Not sure what you mean here, sorry. Can you elaborate?

Well, while doing the computation the first time, you can track what
depends on what.  Then you save *that* information out.  Here's an
example:

main = runODIO $ do
    do
        bar <- readFileOD "bar.txt"
        baz <- readFileOD "baz.txt"
        let result = expensiveComputation bar baz
        writeFileOD "foo.bin" result

    do
        hat <- readFileOD "hat.txt"
        let result = otherComputation hat
        writeFileOD "foo2.bin" result

Now, as you mentioned before, you know that the RHS of >> doesn't
depend on the files read on the LHS.  So the two "do" blocks here are
independent.  Now, if you run with no information, you run the whole
computation, and you write out in your metadata "First we are going to
build foo.bin from bar.txt and baz.txt, and then we build foo2.bin
from hat.txt".  Now when you get to the first "do" block, you know
what computation is about to happen (since you've recorded it before),
and can check the timestamps of foo.bin, bar.txt, and baz.txt, and
potentially skip the whole thing.

Of course now the metadata depends on the script itself, but you
already had to deal with that problem :)

> > 2) I agree with Luke that this "smells" more like an applicative
> > functor.  But getting to monad syntax is quite nice if you can do so.
> > As an applicative functor you would have "writeFileOD :: Filename ->
> > ODIO ByteString -> ODIO ()"; then writeFile can handle all the
> > necessary figuring out of timestamps itself, and you get the bonus
> > guarantee that the contents of the files read by the "ODIO ByteString"
> > argument won't affect the filename you are going to output to.
>
> I thought about this (without having the applicative abstraction in
> mind). This would then look like:
>
> main = do
>  f1 <- readFileOD "infile1"
>   f2 <- readFileOD "infile2"
>  writeFileOD "outfile1" $ someFunc <$> f1 <*> f2
>  writeFileOD "outfile2" $ someOtherFunc <$> f1
>
> right?

Not exactly.  Try this:

   writeFileOD "outfile1" (someFunc <$> readFileOD "infile1" <*>
readFileOD "infile2")
   writeFileOD "outfile2" (someOtherFunc <$> readFIleOD "infile1")

(or, equivalently, replace the "<-" with "let .. in" in your data).

> Will it still work so that if both outfiles need to be generated,
> f1 is read only once?

That depends how you write it!  Remember that you can write your
applicative functor to just build up a graph of what computation might
need to be done.  You can then analyze that graph and look for sharing
if necessary.

If you want the sharing to be explicit, you need something a bit more
monad-ish.  If the type of "readFileOD" is "Filename -> ODIO (ODIO
ByteString)" then your original syntax works and gives you a chance to
pick up on the explicit sharing by labelling the result of "f1 <-
...".

  -- ryan