[Haskell-cafe] Binary parser combinators and pretty printing
Tomasz Zielonka
tomasz.zielonka at gmail.com
Tue Sep 13 17:31:56 EDT 2005
On Tue, Sep 13, 2005 at 06:03:00PM +0300, Einar Karttunen wrote:
> We will use the following Haskell datatype:
>
> data Packet = Packet Word32 Word32 Word32 [FastString]
>
> 1) Simple monadic interface
>
> [...]
>
> This works but writing the code gets tedious and dull.
>
> 2) Using better combinators
>
> packet = w32be <> w32be <> w32be <> lengthPrefixList w32be (lengthPrefixList w32be bytes)
> getPacket = let (mid,sid,rid,vars) = getter packet in Packet mid sid rid vars
> putPacket (Packet mid sid rid vars) = setter packet mid sid rid vars
>
> Maybe even the tuple could be eliminated by using a little of TH.
> Has anyone used combinators like this before and how did it work?
No need for TH. If you have monadic interface, you can write getPacket
as:
getPacket = (return Packet) `ap` w32be `ap` w32be `ap` w32be `ap` lengthPrefixList w32be (lengthPrefixList w32be bytes)
There's more trouble with putPacket though.
> 3) Using TH entirely
>
> $(getAndPut 'Packet "w32 w32 w32 lengthPrefixList (w32 bytes)")
>
> Is this better than the combinators in 2)? Also what sort of
> syntax would be best for expressing nontrivial dependencies -
> e.g. a checksum calculated from other fields.
How about all these points together?:
a) Simple monadic interface
b) Using better combinators
c) Using TH to generate code for the simple cases
d) Using type-classes
Having a monadic interface doesn't prevent you from introducing other
combinators. In fact, every useful monad should have some combinators
other than >>= and return. There are already some generic monadic
combinators that can simplify your code, as shown in the getPacket
example.
Points c) and d) are closely related - you can introduce a type class
for Binary decodable/encodable datatypes and then generate instances
with TH. The code for these instances is generated directly from the
structure of a datatype and it is quite simple, because it's mostly
recursively using the type-class methods - this can greatly simplify
TH code.
So, assuming that you have instances of Binary for Word32 and
FastString and [], making Packet an instance of Binary would amount to
writing
data Packet = Packet Word32 Word32 Word32 [FastString]
$(deriveBinary 'Packet)
Manually written instances for Packet would look like this:
instance Binary Packet where
decode = f $ f $ f $ f $ return Packet
encode (Packet mid sid rid vars) = do
encode mid
encode sid
encode rid
encode vars
f x = x `ap` decode
Unfortunately the world is not that simple, so you'll probably
a bit more complicated framework to handle varying endianness,
varying encodings for the same types, strange encoding schemas
(like DNS packet compression, <number of records> fields far
away from the record sequences, etc).
To some degree it can be solved by introducing newtypes or making
more complicated typeclasses.
I've played with such frameworks a couple of times and I feel it's time
to make a library useful for others. If you're interested, we could
cooperate.
> 4) Using a syntax extension
If there is any extension that would help here, I think it should be
something more general than merely a syntax for specifying binary
format. This problem seems like a good use for generics and TH.
Best regards
Tomasz
More information about the Haskell-Cafe
mailing list