[Haskell-cafe] Re: Fwd: Semantics of iteratees, enumerators, enumeratees?

Tue Aug 24 08:32:20 EDT 2010

I think the big problem with chunking is that many useful iteratees need to
be able to inspect the length of the chunk.  The most obvious is "drop", but
there are many others.  Or if not inspect the length, have a new function on
the stream "dropReport :: Int -> s -> (s, Int)" which reports how much was
dropped.  Either way, chunking adds a deal of implementation burden.

I suspect that the proper vocabulary for iteratees wouldn't include chunks
at all, only single elements.  This discussion has prompted me to consider
the implications of such an implementation, as it would be much simpler.  I
have one idea that I think will at least maintain performance for many
operations, although there will be performance hits too.  If the drawbacks
are in areas that aren't particularly useful, though, it may be acceptable.

John

> From: Conal Elliott <conal at conal.net>
>
> Here's a way I've been tinkering with to think about iteratees clearly.
>
> For simplicity, I'll stick with pure, error-free iteratees for now, and
> take
> chunks to be strings.  Define a function that runs the iteratee:
>
> > runIter :: Iteratee a -> [String] -> (a, [String])
>
> Note that chunking is explicit here.
>
> Next, a relation that an iteratee implements a given specification, defined
> by a state transformer:
>
> > sat :: Iteratee a -> State String a -> Bool
>
> Define sat in terms of concatenating chunks:
>
> > sat it st =
> >   second concat . runIter it == runState st . second concat
>
> where the RHS equality is between functions (pointwise/extensionally), and
> runState uses the representation of State directly
>
> > runState :: State s a -> s -> (a,s)
>
> (I think this sat definition is what Conrad was alluding to.)
>
> Now use sat to specify and verify operations on iteratees and to
> *synthesize* those operations from their specifications.  Some iteratees
> might not satisfy *any* (State-based) specification.  For instance, an
> iteratee could look at the lengths or number of its chunks and produce
> results accordingly.  I think of such iteratees as abstraction leaks.  Can
> the iteratee vocabulary be honed to make only well-behaved (specifiable)
> iteratees possible to express?  If so, can we preserve performance
> benefits?
>
> If indeed the abstraction leaks can be fixed, I expect there will be a
> simpler & more conventional semantics than sat above.
>
>  - Conal
>
>
> On Tue, Aug 24, 2010 at 2:55 PM, Conrad Parker <conrad at metadecks.org>
> wrote:
>
> > On 24 August 2010 14:47, Jason Dagit <dagit at codersbase.com> wrote:
> > >
> > >
> > > On Mon, Aug 23, 2010 at 10:37 PM, Conrad Parker <conrad at metadecks.org>
> > > wrote:
> > >>
> > >> On 24 August 2010 14:14, Jason Dagit <dagit at codersbase.com> wrote:
> > >> > I'm not a semanticist, so I apologize right now if I say something
> > >> > stupid or
> > >> > incorrect.
> > >> >
> > >> > On Mon, Aug 23, 2010 at 9:57 PM, Conal Elliott <conal at conal.net>
> > wrote:
> > >> >>>
> > >> >>> So perhaps this could be a reasonable semantics?
> > >> >>>
> > >> >>> Iteratee a = [Char] -> Maybe (a, [Char])
> > >> >>
> > >> >> I've been tinkering with this model as well.
> > >> >>
> > >> >> However, it doesn't really correspond to the iteratee interfaces
> I've
> > >> >> seen, since those interfaces allow an iteratee to notice size and
> > >> >> number of
> > >> >> chunks.  I suspect this ability is an accidental abstraction leak,
> > >> >> which
> > >> >> raises the question of how to patch the leak.
> > >> >
> > >> > From a purely practical viewpoint I feel that treating the chunking
> as
> > >> > an
> > >> > abstraction leak might be missing the point.  If you said, you
> wanted
> > >> > the
> > >> > semantics to acknowledge the chunking but be invariant under the
> size
> > or
> > >> > number of the chunks then I would be happier.
> > >>
> > >> I think that's the point, ie. to specify what the invariants should
> > >> be. For example (to paraphrase, very poorly, something Conal wrote on
> > >> the whiteboard behind me):
> > >>
> > >> run [concat [chunk]] == run [chunk]
> > >>
> > >> ie. the (a, [Char]) you maybe get from running an iteratee over any
> > >> partitioning of chunks should be the same, ie. the same as from
> > >> running it over the concatenation of all chunks, which is the whole
> > >> input [Char].
> > >
> > > I find this notation foreign.  I get [Char], that's the Haskell String
> > > type, but what is [chunk]?  I doubt you mean a list of one element.
> >
> > sorry, that was just my way of writing "the list of chunks" or perhaps
> > "the stream of chunks that represents the input".
> >
> > Conrad.
> >
> > >
> > >>
> > >> > I use iteratees when I need to be explicit about chunking and when I
> > >> > don't
> > >> > want the resources to "leak outside" of the stream processing.  If
> you
> > >> > took
> > >> > those properties away, I wouldn't want to use it anymore because
> then
> > it
> > >> > would just be an inelegant way to do things.
> > >>
> > >> Then I suppose the model for Enumerators is different than that for
> > >> Iteratees; part of the point of an Enumerator is to control the size
> > >> of the chunks, so that needs to be part of the model. An Iteratee, on
> > >> the other hand, should not have to know the size of its chunks. So you
> > >> don't want to be able to know the length of a chunk (ie. a part of the
> > >> stream), but you do want to be able to, say, fold over it, and to be
> > >> able to stop the computation at any time (these being the main point
> > >> of iteratees ...).
> > >
> > > I think I agree with that.
> > > Jason
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100824/ffcad37e/attachment.html