[Haskell-cafe] Re: Haskell-Cafe Digest, Vol 78, Issue 14

Thu Feb 11 18:49:58 EST 2010

On Thu, Feb 11, 2010 at 10:00 AM, Gregory Collins
<greg at gregorycollins.net> wrote:
> Maciej Piechotka <uzytkownik2 at gmail.com> writes:
>
>> On Tue, 2010-02-09 at 16:41 +0000, John Lato wrote:
>>>
>>> See http://inmachina.net/~jwlato/haskell/ParsecIteratee.hs for a valid
>>> Stream instance using iteratee.  Also Gregory Collins recently posted
>>> an iteratee wrapper for Attoparsec to haskell-cafe.  To my knowledge
>>> these are not yet in any packages, but hackage is vast.
>>
>> Hmm. Am I correct that his implementation caches everything?
>
> The one that John posted (iteratees on top of parsec) has to keep a copy
> of the entire input, because parsec wants to be able to do arbitrary
> backtracking on the stream.

This is true, however I believe this alternative approach is also
correct.  The Cursor holds the stream state, and parsec holds on to
the Cursor for backtracking.  Data is only read within the Iteratee
monad when it goes beyond the currently available cursors, at which
point another cursor is added to the linked list (implemented with
IORef or other mutable reference).

The downside to this approach is that data is consumed from the
iteratee stream for a partial parse, even if the parse fails.  I did
not want this behavior, so I chose a different approach.

>
>> I tried to rewrite the implementation using... well imperative linked
>> list. For trivial benchmark it have large improvement (althought it may
>> be due to error in test such as using ByteString) and, I believe, that
>> it allows to free memory before finish.
>>
>> Results of test on Core 2 Duo 2.8 GHz:
>> 10:   0.000455s       0.000181s
>> 100:  0.000669s       0.001104s
>> 1000: 0.005209s       0.023704s
>> 10000:        0.053292s       1.423443s
>> 100000:       0.508093s       132.208597s
>

I'm surprised your version has better performance for small numbers of
elements.  I wonder if it's partially due to more aggressive inlining
from GHC or something of that nature.  Or maybe your version compiles
to a tighter loop as elements can be gc'd.

I expected poor performance of my code for larger numbers of elements,
as demonstrated here.

I envisioned the usage scenario where parsers would be relatively
short (<20 chars), and most of the work would be done directly with
iteratees.  In this case it would be more important to preserve the
stream state in the case of a failed parse, and the performance issues
of appending chunks wouldn't arise either.

Cheers,
John