[Haskell-cafe] mdo with multiple values

Heinrich Apfelmus apfelmus at quantentunnel.de
Thu Jan 16 10:34:52 UTC 2014


Joachim Breitner wrote:
> the background is a binary format assembler, so you can think of the
> monad as the Put monad, only sufficiently lazy to admit a useful
> MonadFix instance. Then one can do nice things like
> 
>         mdo 
>           putWord32 offset
>           putBS someMoreHeaderData
>           ...
>           offset <- getCurrentOffset
>           putBS byteString1
> 
> where I conceptually use the offset before it is known.
>         
> [..]     
> Now I try to generalize that to a list of Bytestrings, and just from the
> looks of it, this is what you want to do:
> 
>      mdo 
>         mapM_ putWord32 offsets
>         putBS someMoreHeaderData
>         ...
>         offsets <- forM byteStrings $ \b -> 
>            offset <- getCurrentOffset
>            putBS b
>            return offset
> 
> but that will <<loop>>.

So, the overall idea is that we can reserve 32 bits for the offset and 
defer its calculation to later. In other words,  putWord32  can "forward 
the file pointer" long before it actually writes the word.


Now, what's the problem with the `offsets` list? I think it may actually 
work in some monads that have a good MonadFix instance, but often, the 
MonadFix instances tend to be poor. One noteable example of the latter 
is the IO monad, and I have found the following rule to be very useful:

   To use value recursion in the IO monad, make sure
   that the sequence of *computations* is always
   defined in advance, only the values that these
   computations operate on may be lazy.

This rule is best explained with the problem at hand, by asking the 
following question: how many `putWord32` instructions does the following 
expression generate?

   mapM_ putWord32 offsets

Clearly, it should be `length offsets` many, but the problem is that 
this number is not known until the spine of `offsets` has been 
calculated. Now, some monads may be able to do that, but the rule for IO 
is that the number of `putWord32` must be known before the later 
definition of `offsets` can yield a value different from _|_ at all.

Now, if a recursive expression works in the IO monad, then it will work 
in any other monad as well.

Fortunately, we do known the spine of `offsets` in advance: it has the 
same spine as `byteStrings`. The solution is to make that explicit in 
the code, by using a lazy `zip`:

      ...
      mapM_ putWord32 (offsets `spine` byteStrings)
      ...

    where
    spine :: [a] -> [void] -> [a]
    spine ~[]     []     = []
    spine ~(x:xs) (y:ys) = x : tag xs ys

This code takes the spine of `byteStrings` and fills it with values from 
`offsets`.


It may also be possible to get rid of the <<loop>> by defining the monad 
appropriately: one pass calculates all offsets, a second pass calculates 
the actual output. The spine of `offsets` does not depend on the offset 
calculation, so there is a good chance that this might work recursively.


Best regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com



More information about the Haskell-Cafe mailing list