[Haskell-cafe] Brainstorming on how to parse IMAP

Tue Aug 5 13:43:03 EDT 2008

Donn Cave wrote:
> I mentioned that my parser may return an incomplete status.  In principle,
> something like
> 
>   parseResponse :: ByteString -> Maybe (IMAPResponse, ByteString)
> 
> That means the parse needs to be repeated each time until enough input
> data has accumulated.  I worried a little about how to represent a
> useful incomplete parse state, but decided that it isn't worth the
> trouble - the amount of parsing in an ordinary response is too trivial.

But the problem here is getting that first ByteString in the first place.

Are you suggesting it would be the result of hGetContents or somesuch,
reading from a socket?

I've tried the sort of approach with my FTP library.  It can be done,
but it is exceptionally tricky and I wouldn't do it again.  You have to
make extremely careful use of things like try in Parsec, and even just
regular choices (sometimes it wants to read a character that won't exist
yet.)  Also buffering plays into it too.

The trick with reading from the network in a back-and-forth protocol is
knowing how much to read.  You have to be very careful here.  If you try
to read too much (and are blocking until you read), you will get
deadlock because you are reading data that the other end isn't going to
send yet.

And that, as I see it, is the problem with the above.  It seems to be a
chicken-and-egg problem in my mind: how do you know how much data to
read until you've parsed the last bit of data that tells you how to read
the next bit?

> | 3) The linkage between Parsec and IO is weak.  I cannot write an
> | "IMAPResponse" parser.  I would have a write a set of parsers to parse
> | individual components of the IMAP response as part of the IO monad code
> | that reads the IMAP response, since the result of one dictates how much
> | network data I attempt to read.
> 
> 
> The parser should just parse data, and not read it.
> 
> You don't need to worry about whether you can get recv(2) semantics on
> a socket with bytestrings, and you don't need to saddle users of this
> parser with whatever choice you might make there.  You don't need to
> supply an SSL input function with the same semantics, or account for
> the possibility that data might actually not be coming from a socket
> at all (UNIX pipes are not unheard of.)  You don't need to lock users
> into whatever execution dispatching might be supported by that I/O,
> potentially ruling out graphics libraries etc. that might not be compatible.
> 
> So I would let the application come up with the data.  In general,

Well, your response here begs the question of how much you want to
automate from the application.  Yes, there are multiple ways of
communicating with IMAP servers, but you have the same synchronization
issues with all of them.  Yes, I plan to let the application supply
functions to read data.  But in the end, that is pointless if those
functions can't be written in Haskell!

> I don't think there's any way to specify how much to read - I mean,
> the counted literal certainly provides that information, but that's the
> exception - so I would assume the application will need some kind of
> recv(2)-like function that reads data as available.

Exactly.  But there is no recv()-like function, except the one that
returns IO String.  There is no recv()-like function that returns IO
ByteString.

(I actually notice now a package on Hackage that does this... why it's
not in core, I don't know.)

> By the way, if you haven't already run across this, you may be interested
> to read about the IMAP "IDLE" command, cf. RFC2177.  I think the value
> of this feature can be overstated, but it depends on the server, and some
> IMAP client implementators are very fond of it.  At this point, the reason
> it might be interesting is that it moves away from the call & response
> pattern.

It's on my todo list.

-- John