[Haskell-cafe] Re: Fwd: Semantics of iteratees, enumerators,
dagit at codersbase.com
Tue Aug 24 04:14:51 EDT 2010
On Tue, Aug 24, 2010 at 12:49 AM, Heinrich Apfelmus <
apfelmus at quantentunnel.de> wrote:
> Jason Dagit wrote:
>> From a purely practical viewpoint I feel that treating the chunking
>> as an abstraction leak might be missing the point. If you said, you
>> wanted the semantics to acknowledge the chunking but be invariant
>> under the size or number of the chunks then I would be happier.
>> I use iteratees when I need to be explicit about chunking and when I
>> don't want the resources to "leak outside" of the stream processing.
>> If you took those properties away, I wouldn't want to use it anymore
>> because then it would just be an inelegant way to do things.
> I'm curious, can you give an example where you want to be explicit about
> chunking? I have a hard time imagining an example where chunking is
> beneficial compared to getting each character in sequence. Chunking
> seems to be common in C for reasons of performance, but how does that
> apply to Haskell?
It applies to Haskell for the same reasons, as far as I can tell. You want
it to manage performance characteristics. See my example below. If you
wrote it using chunking you wouldn't need lazy io (which is argued quite
well in other place to be bad and I assume you've read the arguments and
more or less agree). Furthermore, wouldn't iteratees force you to implement
something equivalent to either option #1 or #2, but #3 wouldn't be possible?
I think it basically comes down to this: We replace lazy io with explicit
chunking because lazy io is unsafe, but explicit chunking can be safe.
So, if you had a lazy pure generator you wouldn't need chunking, although
perhaps the iteratee style would help avoid accidental space leaks that
happen from referencing the stream elements outside of the fold (like #3
> On the matter of leaking resources outside the stream processing,
> Iteratee does not give you any guarantees, it's only a stylistic aid
> (which can be powerful, of course). For instance, the following Iteratee
> returns the whole stream as a list:
I think your example is fine. I consider it a misbehaving iteratee, in the
same way that returning any large structure would be misbehaving in this
context. I think, if the iteratee returns something large that's different
than letting things "leak out". It's like a difference of scope. A
well-behaved iteratee will reduce the input to reasonable return value.
What would be bad, is if other bits of code could reference parts of the
stream, while the iteratee is looking at it, and hold on to it. That would
cause a space leak. An example of this bad behavior, would be to use
readFile to read a file. Then compute two things: a) sum of the bytes in
the file as Int32, b) length (in number of characters) of the file.
Supposing we use lazy io (Prelude.readFile):
1) read the file, compute (a), close the file, read the file, compute (b),
and finally close the file. You can do so in constant space.
2) read the file, use one pass to calculate both (a) and (b) at the same
time, then close the file. You can do so in constant space.
3) read the file, use one pass to compute (a) followed by a pass to
compute (b), then close the file. The space used will be O(filesize).
I consider option #3 to be letting the elements of the stream "leak out".
The computation in (b) references them and thus the garbage collector
doesn't free them between (a) and (b), and the optimizer cannot fuse (a) and
(b) in all cases.
There is a fourth option, and that is to use strict io but then each of the
above takes space O(filesize).
I hope that makes sense. It's getting late here and I could be talking
non-sense, but I have tried the above 3 cases in the past and as best as I
can recall those were my findings.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Haskell-Cafe