[Haskell-cafe] Re: Fwd: Semantics of iteratees, enumerators, enumeratees?

John Lato jwlato at gmail.com
Tue Aug 24 11:31:31 EDT 2010


Hi Conal,

I'm aware of one case that violates semantic referential transparency, but
it's a bug.  Which pretty much proves your point as I understand it.

John

On Tue, Aug 24, 2010 at 2:01 PM, Conal Elliott <conal at conal.net> wrote:

> Hi John,
>
> Please note that I'm suggesting eliminating chunks from the semantics only
> -- not from the implementation.
>
> For precise & simple chunk-less semantics, it's only important that the
> iteratees map equivalent input streams to equivalent output streams, where
> "equivalent" means equal after concatenating all of the chunks.  In other
> words, the chunk lists denote their concatenations, so semantically equal
> inputs must lead to semantically equal outputs.  (Assuming I understand the
> intention of chunking as being an implementation issue only, i.e., present
> only for optimization.)  We could call this property "semantic referential
> transparency".  IIUC, 'drop' is semantically RT, since it's *specified* in
> terms of elements (and only *implemented* in terms of chunks).
>
> Do you know of any iteratees in use that map (semantically) equal inputs to
> (semantically) unequal outputs, i.e.,that  violate semantic RT as I've
> defined it?  In the current APIs, one can easily define such iteratees, but
> I'm hoping that the programming interfaces can be repaired to eliminate that
> problem (the "abstraction leaks" I've been mentioning).
>
>    - Conal
>
>
> On Tue, Aug 24, 2010 at 9:32 PM, John Lato <jwlato at gmail.com> wrote:
>
>> I think the big problem with chunking is that many useful iteratees need
>> to be able to inspect the length of the chunk.  The most obvious is "drop",
>> but there are many others.  Or if not inspect the length, have a new
>> function on the stream "dropReport :: Int -> s -> (s, Int)" which reports
>> how much was dropped.  Either way, chunking adds a deal of implementation
>> burden.
>>
>> I suspect that the proper vocabulary for iteratees wouldn't include chunks
>> at all, only single elements.  This discussion has prompted me to consider
>> the implications of such an implementation, as it would be much simpler.  I
>> have one idea that I think will at least maintain performance for many
>> operations, although there will be performance hits too.  If the drawbacks
>> are in areas that aren't particularly useful, though, it may be acceptable.
>>
>> John
>>
>>
>>> From: Conal Elliott <conal at conal.net>
>>>
>>>
>>> Here's a way I've been tinkering with to think about iteratees clearly.
>>>
>>> For simplicity, I'll stick with pure, error-free iteratees for now, and
>>> take
>>> chunks to be strings.  Define a function that runs the iteratee:
>>>
>>> > runIter :: Iteratee a -> [String] -> (a, [String])
>>>
>>> Note that chunking is explicit here.
>>>
>>> Next, a relation that an iteratee implements a given specification,
>>> defined
>>> by a state transformer:
>>>
>>> > sat :: Iteratee a -> State String a -> Bool
>>>
>>> Define sat in terms of concatenating chunks:
>>>
>>> > sat it st =
>>> >   second concat . runIter it == runState st . second concat
>>>
>>> where the RHS equality is between functions (pointwise/extensionally),
>>> and
>>> runState uses the representation of State directly
>>>
>>> > runState :: State s a -> s -> (a,s)
>>>
>>> (I think this sat definition is what Conrad was alluding to.)
>>>
>>> Now use sat to specify and verify operations on iteratees and to
>>> *synthesize* those operations from their specifications.  Some iteratees
>>> might not satisfy *any* (State-based) specification.  For instance, an
>>> iteratee could look at the lengths or number of its chunks and produce
>>> results accordingly.  I think of such iteratees as abstraction leaks.
>>>  Can
>>> the iteratee vocabulary be honed to make only well-behaved (specifiable)
>>> iteratees possible to express?  If so, can we preserve performance
>>> benefits?
>>>
>>> If indeed the abstraction leaks can be fixed, I expect there will be a
>>> simpler & more conventional semantics than sat above.
>>>
>>>  - Conal
>>>
>>>
>>> On Tue, Aug 24, 2010 at 2:55 PM, Conrad Parker <conrad at metadecks.org>
>>> wrote:
>>>
>>> > On 24 August 2010 14:47, Jason Dagit <dagit at codersbase.com> wrote:
>>> > >
>>> > >
>>> > > On Mon, Aug 23, 2010 at 10:37 PM, Conrad Parker <
>>> conrad at metadecks.org>
>>> > > wrote:
>>> > >>
>>> > >> On 24 August 2010 14:14, Jason Dagit <dagit at codersbase.com> wrote:
>>> > >> > I'm not a semanticist, so I apologize right now if I say something
>>> > >> > stupid or
>>> > >> > incorrect.
>>> > >> >
>>> > >> > On Mon, Aug 23, 2010 at 9:57 PM, Conal Elliott <conal at conal.net>
>>> > wrote:
>>> > >> >>>
>>> > >> >>> So perhaps this could be a reasonable semantics?
>>> > >> >>>
>>> > >> >>> Iteratee a = [Char] -> Maybe (a, [Char])
>>> > >> >>
>>> > >> >> I've been tinkering with this model as well.
>>> > >> >>
>>> > >> >> However, it doesn't really correspond to the iteratee interfaces
>>> I've
>>> > >> >> seen, since those interfaces allow an iteratee to notice size and
>>> > >> >> number of
>>> > >> >> chunks.  I suspect this ability is an accidental abstraction
>>> leak,
>>> > >> >> which
>>> > >> >> raises the question of how to patch the leak.
>>> > >> >
>>> > >> > From a purely practical viewpoint I feel that treating the
>>> chunking as
>>> > >> > an
>>> > >> > abstraction leak might be missing the point.  If you said, you
>>> wanted
>>> > >> > the
>>> > >> > semantics to acknowledge the chunking but be invariant under the
>>> size
>>> > or
>>> > >> > number of the chunks then I would be happier.
>>> > >>
>>> > >> I think that's the point, ie. to specify what the invariants should
>>> > >> be. For example (to paraphrase, very poorly, something Conal wrote
>>> on
>>> > >> the whiteboard behind me):
>>> > >>
>>> > >> run [concat [chunk]] == run [chunk]
>>> > >>
>>> > >> ie. the (a, [Char]) you maybe get from running an iteratee over any
>>> > >> partitioning of chunks should be the same, ie. the same as from
>>> > >> running it over the concatenation of all chunks, which is the whole
>>> > >> input [Char].
>>> > >
>>> > > I find this notation foreign.  I get [Char], that's the Haskell
>>> String
>>> > > type, but what is [chunk]?  I doubt you mean a list of one element.
>>> >
>>> > sorry, that was just my way of writing "the list of chunks" or perhaps
>>> > "the stream of chunks that represents the input".
>>> >
>>> > Conrad.
>>> >
>>> > >
>>> > >>
>>> > >> > I use iteratees when I need to be explicit about chunking and when
>>> I
>>> > >> > don't
>>> > >> > want the resources to "leak outside" of the stream processing.  If
>>> you
>>> > >> > took
>>> > >> > those properties away, I wouldn't want to use it anymore because
>>> then
>>> > it
>>> > >> > would just be an inelegant way to do things.
>>> > >>
>>> > >> Then I suppose the model for Enumerators is different than that for
>>> > >> Iteratees; part of the point of an Enumerator is to control the size
>>> > >> of the chunks, so that needs to be part of the model. An Iteratee,
>>> on
>>> > >> the other hand, should not have to know the size of its chunks. So
>>> you
>>> > >> don't want to be able to know the length of a chunk (ie. a part of
>>> the
>>> > >> stream), but you do want to be able to, say, fold over it, and to be
>>> > >> able to stop the computation at any time (these being the main point
>>> > >> of iteratees ...).
>>> > >
>>> > > I think I agree with that.
>>> > > Jason
>>> >
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100824/b5b1254b/attachment.html


More information about the Haskell-Cafe mailing list