[Haskell-cafe] Binary parser combinators and pretty printing

Tomasz Zielonka tomasz.zielonka at gmail.com
Tue Sep 13 17:31:56 EDT 2005


On Tue, Sep 13, 2005 at 06:03:00PM +0300, Einar Karttunen wrote:
> We will use the following Haskell datatype:
> 
> data Packet = Packet Word32 Word32 Word32 [FastString]
> 
> 1) Simple monadic interface
>
> [...]
>
> This works but writing the code gets tedious and dull. 
>
> 2) Using better combinators
> 
> packet = w32be <> w32be <> w32be <> lengthPrefixList w32be (lengthPrefixList w32be bytes)
> getPacket = let (mid,sid,rid,vars)  = getter packet in Packet mid sid rid vars
> putPacket (Packet mid sid rid vars) = setter packet mid sid rid vars
> 
> Maybe even the tuple could be eliminated by using a little of TH.
> Has anyone used combinators like this before and how did it work?
 
No need for TH. If you have monadic interface, you can write getPacket
as:

getPacket = (return Packet) `ap` w32be `ap` w32be `ap` w32be `ap` lengthPrefixList w32be (lengthPrefixList w32be bytes)

There's more trouble with putPacket though.

> 3) Using TH entirely
> 
> $(getAndPut 'Packet "w32 w32 w32 lengthPrefixList (w32 bytes)")
> 
> Is this better than the combinators in 2)? Also what sort of 
> syntax would be best for expressing nontrivial dependencies - 
> e.g. a checksum calculated from other fields.

How about all these points together?:

a) Simple monadic interface
b) Using better combinators
c) Using TH to generate code for the simple cases
d) Using type-classes

Having a monadic interface doesn't prevent you from introducing other
combinators. In fact, every useful monad should have some combinators
other than >>= and return. There are already some generic monadic
combinators that can simplify your code, as shown in the getPacket
example.

Points c) and d) are closely related - you can introduce a type class
for Binary decodable/encodable datatypes and then generate instances
with TH. The code for these instances is generated directly from the
structure of a datatype and it is quite simple, because it's mostly
recursively using the type-class methods - this can greatly simplify
TH code.

So, assuming that you have instances of Binary for Word32 and
FastString and [], making Packet an instance of Binary would amount to
writing

  data Packet = Packet Word32 Word32 Word32 [FastString]

  $(deriveBinary 'Packet)

Manually written instances for Packet would look like this:

  instance Binary Packet where
    decode = f $ f $ f $ f $ return Packet

    encode (Packet mid sid rid vars) = do
        encode mid
        encode sid
        encode rid
        encode vars

  f x = x `ap` decode

Unfortunately the world is not that simple, so you'll probably
a bit more complicated framework to handle varying endianness,
varying encodings for the same types, strange encoding schemas
(like DNS packet compression, <number of records> fields far
away from the record sequences, etc).

To some degree it can be solved by introducing newtypes or making
more complicated typeclasses.

I've played with such frameworks a couple of times and I feel it's time
to make a library useful for others. If you're interested, we could
cooperate.

> 4) Using a syntax extension

If there is any extension that would help here, I think it should be
something more general than merely a syntax for specifying binary
format. This problem seems like a good use for generics and TH.

Best regards
Tomasz


More information about the Haskell-Cafe mailing list