[Haskell-cafe] Re: Brainstorming on how to parse IMAP
haskell at list.mightyreason.com
Tue Aug 5 16:49:24 EDT 2008
I am glad you asked Ben,
Short answer: It can return a Seq of your values. The values in the Seq are
lazy, the Seq itself is finite. It can return what it has so far before it
finishes parsing (or even before the rest of the input has arrived from the
Ben Franksen wrote:
> ChrisK wrote:
>> I recently posted new and fancy binary Get monads in
>> which might be of interest since network protocol are usually specified in
>> bytes at the wire level.
>> The latest one takes input which may or may not be complete and returns
>> stream (a Seq) of results.
> IIRC Seq is not a 'Stream' but a strict sequence? Or do you meant 'a stream
> (of Seq)'?
I meant it returns many (Seq y), one after the other, while doing parsing in
The complicated parser looks like this. Start with the run function:
runCompGet :: (Monad m,Monoid w)
=> CompGet a y r w user m a
-> r -> user -> L.ByteString
-> m (CompResult y w user m a)
This takes a CompGet and a reader state r and a user state user and the
(initial) input L.ByteString (Data.ByteString.Lazy.ByteString). It evaluates to
the inner monad 'm' returning a CompResult.
The CompResult is a three-fold type:
> data CompResult y w user m a =
> CFailed (Seq y) !Int64 String
> | CFinished (Seq y) !L.ByteString !Int64 w user a
> | CPartial (Seq y) (Either ( m (CompResult y w user m a) )
> ( Maybe L.ByteString -> m (CompResult y w user m a) ))
All three have (Seq y) which are the Data.Sequence.Seq of things which have been
queued by "yieldItem".
CFailed also has the Int64 count of bytes parsed successfully and an error
message String. Nothing more can be done.
CFinished also has the unused tail of the input as a L.ByteString and an Int64
of the bytes consumed. And the output of the writer w, the final user state,
and lastly it has the end value returned by the computation which has type 'a'.
Nothing more can be done.
CPartial is the intermediate result. It also carries Either:
Left : the rest of the computation, currently suspended, to continue running.
Right: a function from (Maybe ByteString) to the suspended computation.
The Left is a result of the "flushItems" command and is merely a way to return
the (Seq y) so far before continuing.
The Right is a result of running out of input data. This allows the program to
feed more input into the parser which will be appended to all the previous
input. One does this by passing (Just someByteString) to the function. If the
parser again runs out of data it will again return CPartial with a Right value.
Alternatively, one can pass Nothing. This tells the parser that there will
never ever be more input. The parser will never ask for (though it may
flushItems and return a Right valued CPartial).
A key thing about the (Seq y) is that yielded items are only returned once. The
CPartial may be returned many times and each time it will have an empty list or
fresh list of (Seq y). The values in the Seq are lazy, the Seq itself is
finite. To collect all the value the caller has to concatenate all the (Seq
y)'s that are returned during parsing.
As for parsing, the module offers the usual BinaryParser interface (package
binary-strict) and has an interface which mostly overlaps Data.Binary.Get
(package binary). For example: it has "getByteString" and "getWord64be" and
You don't have to use the "yieldItem" command. You can just return the results
in the final "a" return value (or the "user" state or the writer "w" value). In
this situation you only get an answer when CFinished is returned (and nothing if
CFailed is returned). I could not use the writer mechanism for yield-ing
because the "listen" and "pass" parts of the MonadWriter class ensure it has the
You might wonder:
*) If the parser code uses MonadPlus to give several alternatives
*) The first alternative gets more input via CPartial (perhaps several times)
*) The first alternative then fails
*) The second alternative starts parsing from the same position the first did
*) Does the second alternative see the new input passed to CPartial earlier?
The answer is yes. Changes to the input stream by appending with CPartial
affect the whole computation and are never rolled back.
You might wonder:
*) If the first alternative calls "yieldItem foo"
*) The first alternative fails
*) The second alternative calls flushItems
*) Does the CPartial (Seq y) contain "foo"?
The answer is yes. Items yielded are never rolled back.
[ Doing all of this in the presence of throwError/mzero/fail and lookAhead* and
callCC was interesting to code. ]
More information about the Haskell-Cafe