[Haskell-cafe] Brainstorming on how to parse IMAP

Tue Aug 5 12:42:27 EDT 2008

Quoth John Goerzen <jgoerzen at complete.org>:
...
| One problem with that is that if I use specific parsing library foo,
| then only others that are familiar with specific parsing library foo can
| hack on it.

Well, you asked.  For me, the parsing is a relatively minor part of the
problem and maybe not one that benefits greatly from external support.
MIME might be a different story.

| 2) A lot of RFC protocols -- and IMAP in particular -- can involve
| complex responses from the server with hierarchical data, and the parse
| of, say, line 1 and of each successive line can indicate whether or not
| to read more data from the server.  Parsing of these lines is a stateful
| activity.

I mentioned that my parser may return an incomplete status.  In principle,
something like

  parseResponse :: ByteString -> Maybe (IMAPResponse, ByteString)

That means the parse needs to be repeated each time until enough input
data has accumulated.  I worried a little about how to represent a
useful incomplete parse state, but decided that it isn't worth the
trouble - the amount of parsing in an ordinary response is too trivial.

| 3) The linkage between Parsec and IO is weak.  I cannot write an
| "IMAPResponse" parser.  I would have a write a set of parsers to parse
| individual components of the IMAP response as part of the IO monad code
| that reads the IMAP response, since the result of one dictates how much
| network data I attempt to read.

The parser should just parse data, and not read it.

You don't need to worry about whether you can get recv(2) semantics on
a socket with bytestrings, and you don't need to saddle users of this
parser with whatever choice you might make there.  You don't need to
supply an SSL input function with the same semantics, or account for
the possibility that data might actually not be coming from a socket
at all (UNIX pipes are not unheard of.)  You don't need to lock users
into whatever execution dispatching might be supported by that I/O,
potentially ruling out graphics libraries etc. that might not be compatible.

So I would let the application come up with the data.  In general,
I don't think there's any way to specify how much to read - I mean,
the counted literal certainly provides that information, but that's the
exception - so I would assume the application will need some kind of
recv(2)-like function that reads data as available.

By the way, if you haven't already run across this, you may be interested
to read about the IMAP "IDLE" command, cf. RFC2177.  I think the value
of this feature can be overstated, but it depends on the server, and some
IMAP client implementators are very fond of it.  At this point, the reason
it might be interesting is that it moves away from the call & response
pattern.

	Donn