[Haskell-cafe] Iteratees again (Was: How on Earth Do You Reason about Space?)

dm-list-haskell-cafe at scs.stanford.edu dm-list-haskell-cafe at scs.stanford.edu
Thu Jun 2 22:21:48 CEST 2011


At Thu, 02 Jun 2011 13:52:52 +0200,
Ketil Malde wrote:
> 
> I have a bunch of old code, parsers etc, which are based on the
> 'readFile' paradigm:
> 
>   type Str = Data.ByteString.Lazy.Char8.ByteString -- usually
> 
>   decodeFoo :: Str -> Foo
>   encodeFoo :: Foo -> Str
> 
>   readFoo = decodeFoo . readFile 
>   writeFoo f = writeFile f . encodeFoo
>   hReadFoo = decodeFoo . hRead
>   :
>   (etc)
> 
> This works pretty well, as long as Foo is strict enough that you don't
> retain all or huge parts of input, and as long as you can process input
> in a forward, linear fashion.  And, like my frequency count above, I
> can't really see how this can be made much simpler.

This is fine if you never have parse errors and always read to the end
of the file.  Otherwise, the code above is incorrect and ends up
leaking file descriptors.  In general, it is very hard to write
parsers that parse every possible input and never fail.  Thus, for
anything other than a toy program, your code actually has to be:

	readFoo path = bracket (hOpen path) hclose $
		hGetContents >=> (\s -> return $! decodeFoo s)

Which is still not guaranteed to work if Foo contains thunks, so then
you end up having to write:

	readFoo path = bracket (hOpen path) hclose $ \h -> do
	  s <- hGetContents h
	  let foo = decodeFoo s
	  deepseq foo $ return foo

Or, finally, what a lot of code falls back to, inserting gratuitous
calls to length:

	readFoo path = bracket (hOpen path) hclose $ \h -> do
	  s <- hGetContents h
	  length s `seq` return decodeFoo s

The equivalent code with the iterIO package would be:

	readFoo path = enumFile path |$ fooI

which seems a lot simpler to me...

> Would there be any great advantage to rewriting my stuff to use
> iterators?  Or at least, use iterators for new stuff?

In addition to avoiding edge cases like leaked file descriptors and
memory, one of the things I discovered in implementing iterIO is that
it's really handy to have your I/O functions be the same as your
parsing combinators.  So iteratees might actually admit a far simpler
implementation of decodeFoo/fooI.

More specifically, imagine that you have decodeFoo, and now want to
implement decodeBar where a Bar includes some Foos.  Unfortunately,
having an implementation of decodeFoo in-hand doesn't help you
implement decodeBar.  You'd have to re-write your function to return
residual input, maybe something like:

	decodeFooReal :: String -> (Foo, String)

	decodeFoo :: String -> Foo
	decodeFoo = fst . decodeFooReal

and now you implement decodeBar in terms of decodeFooReal, but you
have to pass around residual input explicitly, handle parsing failures
explicitly, etc.

> As I see it, iterators are complex and the dust is only starting to
> settle on implementations and interfaces, and will introduce more
> dependencies.  So my instinct is to stick with the worse-is-better
> approach, but I'm willing to be educated.

I fully agree with the point about dependencies and waiting for the
dust to settle, though I hope a lot of that changes in a year or so.
However, iterIO should already significantly reduce the complexity.

David



More information about the Haskell-Cafe mailing list