[Haskell-cafe] Iteratee: manyToOne

Fri Apr 29 14:26:58 CEST 2011

On Fri, Apr 29, 2011 at 12:20 PM, Felipe Almeida Lessa <
felipe.lessa at gmail.com> wrote:

> On Fri, Apr 29, 2011 at 6:32 AM, John Lato <jwlato at gmail.com> wrote:
> > If you do this, the user needs to take care to order the iteratees so
> that
> > the last iteratee has small leftovers.  Consider:
> >
> > manyToOne [consumeALot, return ()]
> >
> > In this case, the entire stream consumed by the first iteratee will need
> to
> > be retained and passed on by manyToOne.  In many cases, the user may not
> > know how much each iteratee will consume, which can make these semantics
> > problematic.
> >
> > Iteratee has 'enumPair', (renamed 'zip' in HEAD) which returns the
> leftovers
> > from whichever iteratee consumes more.  This avoids the problem of
> retaining
> > extra data, and seems simpler to reason about.  Although if you really
> need
> > to consume a predictable amount of data, the safest is probably to run
> the
> > whole thing in a 'take'.
>
> My motivation is: in general it is difficult (impossible?) to choose
> the iteratee that consumed more data because you don't know what the
> data is.  For example, if you give 'Chunks [a,b]' to two iteratees and
> one of them returns 'Chunks [c]' and the other one returns 'Chunks
> [d]', which one consumed more data?  The answer is that it depends on
> the types.  If they are Ints, both consumed the same, if they are
> ByteStrings, you would need to check if one is prefix of the other.
> What if one returns 'Chunks [c]' and the other one returns 'Chunks
> [d,e]'?  If they are ByteStrings, should we compare 'c' against 'd ++
> e'?
>

This situation results from the implementation in the enumerator package.
 In iteratee it doesn't arise with well-behaved* iteratees, because only one
chunk is ever processed at a time.  It's only necessary to check the length
of the returned chunks to see which consumed more data.

By well-behaved, I mean that the chunk returned by an iteratee must be a
tail of the provided input.  In other words, it returns only unconsumed data
from the stream and doesn't alter the stream.  At least in the iteratee
package, an iteratee which violates this rule is likely to result in
undefined behavior (in general, not just this function).

>
> So I thought it would be easier to program with an API that is
> predictable and immune to changes in block sizes.  If you don't want
> leftovers, just use 'manyToOne [..., dropWhile (const True)]', which
> guarantees that you won't leak.
>
>
Iteratees should be immune to changes in block sizes anyway, although it's
been a while since I looked at the enumerator implementation so it could be
different.

If you use 'manyToOne [..., dropWhile (const True)]', when does it
terminate?

John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110429/cec0e1a6/attachment.htm>