[Haskell-cafe] Combining Regions and Iteratees

oleg at okmij.org oleg at okmij.org
Sat Jan 21 11:10:23 CET 2012

Regions is an automatic resource management technique that statically
ensures that all allocated resources are freed and a freed resource
cannot be used. Regions also promote efficiency by helping to structure
the computation so that resources will be freed soon, and en
masse. Therefore, regions are particularly suitable for scarce
resources such as file handles, database or network connections, etc.
A lightweight monadic region library is available on Hackage.

Iteratee IO also aims, among other things, to encapsulate resources
such as file handles and network connections, ensuring their safe use
and prompt disposal. One may wonder how much Iteratees and Regions
have in common and if that commonality can be factored out.  

There seem to be several ways to combine regions and iteratees.  This
message describes the most straightforward attempt, combining a
monadic region library (mostly as it is) with an Iteratee IO library,
also mostly as it is. We use monadic regions to manage file handles or
file descriptors, ensuring that file handles are always closed even in
case of IO and other asynchronous exceptions. An enumerator like
enumFile provided similar guarantees for its handles. (Since an
enumerator keeps its handles to itself, there is no danger of
iteratees' misusing them.) With the monadic region library, the
enumerator code becomes simpler: we no longer have to worry about
exceptions. The main benefit of monadic region library is to manage
files opened by iteratees. The latter being passed around, and so
their resources are harder to keep track.

We thus demonstrate enumFile and iterFile for incremental file reading
and writing, with the same safety guarantees.  All opened files are
*always* closed, regardless of any (asynchronous) exceptions that may
arise during opening, reading, writing or transforming. The code has
many examples of throwing errors from various stages of the pipeline and
at various times. All files are closed. 

The commented code is available at
which uses the lightweight monadic regions library from

Since lightweight monadic library needs rank-2 types (now standard),
it seemed appropriate to avail ourselves to common GHC extensions. 
We can clearly see that enumerators and enumeratees unify, both being
instances of a general type
	forall a. Iteratee e mi a -> mo (Iteratee e mi a)
which is a Monoid. To compose enumerators or enumeratees, we use the
standard mappend.

An alias in the code
	type R e m = Iteratee e m
suggests that Iteratees are the view from the right -- the System.IO,
getChar-like view. From that point of view, Iteratee IO is hardly different
from System.IO (getChar, getLine, peekChar, etc). The dual
	newtype L e mi mo = L{unL :: forall a. R e mi a -> mo (R e mi a)}
is the view from the left. 

Here are a few examples from the IterReg code. The first simply copies one
file to another, block-by-clock.

tIF1 = runSIO $ 
       run =<< unL (enumFile "/etc/motd") (iterFile "/tmp/x")

According to the trace
	opened file /etc/motd
	iterFile: opening /tmp/x
	Closing {handle: /etc/motd}
	Closing {handle: /tmp/x}
the files are indeed closed, but _not_ in the LIFO order. That is
important, so to let an iteratee write data coming from several sources.
For example:

tIF3 = runSIO $ 
       run =<< unL (enumFile "/etc/motd" `mappend`
		    enumFile "/usr/share/dict/words")
	     (iterFile "/tmp/x")

    opened file /etc/motd
    iterFile: opening /tmp/x
    Closing {handle: /etc/motd}
    opened file /usr/share/dict/words
    Closing {handle: /usr/share/dict/words}
    Closing {handle: /tmp/x}

The files will be closed even in case of exceptions:

tIF4 = runSIO $ 
       run =<< unL (enumFile "/etc/motd" `mappend`
		    enumFile "/nonexistent")
	     (iterFile "/tmp/x")

    opened file /etc/motd
    iterFile: opening /tmp/x
    Closing {handle: /etc/motd}
    opened file /nonexistent
    Closing {handle: /tmp/x}
    *** Exception: /nonexistent: openFile: does not exist

All monadic region monads all support shCatch, so we can write our own
exception-handling code. Other examples in IterReg.hs raise errors
during data transformation.

Monadic regions plus GHC extensions simplify code. For example, here
are iterFile and enumFile (the signatures could be omitted; they will
be inferred)

iterFile :: (SMonad1IO m, m ~ (IORT s' m')) 
	    => FilePath -> R ByteString m ()
iterFile fname = lift (newSHandle fname WriteMode) >>= loop
 loop h = getChunk >>= check h
 check h (Chunk s) = lift (mapM (shPut h) s) >> loop h
 check h e  = return ()

enumFile :: (SMonadIO m) => FilePath -> L ByteString m m
enumFile filepath = L $ \iterv -> do
  newRgn $ do
        h <- newSHandle filepath ReadMode
        unL (enumHandle h) iterv

More information about the Haskell-Cafe mailing list