[Haskell-cafe] Brainstorming on how to parse IMAP

John Goerzen jgoerzen at complete.org
Sat Aug 2 22:04:28 EDT 2008


Hi folks,

I'm interested in writing a library to work with IMAP servers.

I'm interested in thoughts people have on parsing libraries and methods.
 I'm a huge fan of Parsec overall -- it lets me have a single-stage
parser, for instance.  But it isn't sufficiently lazy for this task, and
I probably will need to deal with ByteStrings instead of Strings, since
some IMAP messages may be 30MB or more.

So to give a very, very brief rundown of RFC3501, there are lots of ways
that an IMAP server can encode things.   For instance, we could see this:

A283 SEARCH "TEXT" "string not in mailbox"

which is the same as:

A283 SEARCH TEXT "string not in mailbox"

and the same as:

A283 SEARCH {4} "string not in mailbox"
TEXT

The braces mean that the given number of octets follows after the CRLF
at the end of the given line.  We could even see:

A283 SEARCH {4} {21}
TEXTstring not in mailbox

Note that when downloading messages, I would fully expect to see things like

* FETCH {10485760}

representing a 10MB message.

Also, quoted strings have escaping rules.

[ please note that the above is paraphrased and isn't really true
RFC3501 for simplicity sake ]

Now then...  some goals.

1) Ideally I could parse stuff lazily.  I have tried this with FTP and
it is more complex than it seems at first, due to making sure you never,
never, never consume too much data.  But being able to parse lazily
would make it so incredibly easy to issue a command saying "download all
new mail", and things get written to disk as they come in, with no
buffer at all.

2) Avoiding Strings wherever possible.

3) Avoiding complex buffering schemes where I have to manually buffer
data packets.

Thoughts and ideas?

BTW, if any of you have heard of OfflineIMAP, yes I am considering
rewriting OfflineIMAP in Haskell.

-- John


More information about the Haskell-Cafe mailing list