[Haskell-cafe] Interpreting fields in a buffer

Tomasz Zielonka t.zielonka at students.mimuw.edu.pl
Wed Jan 28 11:01:30 EST 2004


On Mon, Jan 26, 2004 at 02:39:32PM -0800, John Meacham wrote:
> On Mon, Jan 26, 2004 at 01:24:37PM +0100, Tomasz Zielonka wrote:
> > 3) Roll your own (de)serialization framework
> > 
> > That's what I did. It's a bit complicated, but I will try to describe it
> > within a couple of days. Right now all I can say is that it uses TH and
> > has a couple of implementations of low-lever decoders, one of which
> > reads directly from UArray Int Word8. I managed to achieve throughput of
> > 3 MB / s for quite complicated binary protocols, and I think I can
> > improve that even further.
> 
> Hi, I would be very interesting in looking at the design of this. Is it
> available on the web somewhere?

I've put the code in attachment. I compiled in GHC 6.0.1 (this may
matter because of Template Haskell. I also used other GHC extensions).

There is an Example.hs, which shows how to declare a record for IP
header and how to use both supplied parsers. This is not a standalone
program, so to try it, compile it with
  
  ghc -O2 --make Example

and load it in GHCi (you have to compile it first, because the
interpreter can't handle unboxed tuples).

The library is really very simple, I created it quickly to solve some
immediate problems. There is no support for bitfields - I think there
would be if I had to deal with them ;)

The original library used little endian encoding. I changed to big
endian aka network order, but there really should be a possibility to
choose. Right now I am not sure how this choice should be available to
the user.

The ability to easily switch from Parsec to an efficient UArray Parser
and the benefit from specializing 'times' came as nice surprises.
Well, not exactly unexpected surprises, because I was striving for it a
bit, but it was easier than I thought.

Template Haskell is used to automatically derive instances for record
types. In my application I had to create some instances by hand, because
some regions of binary files where not self-describing - they needed
additional information from the outside.

There is also an encoding part - if someone's interested, I can extract
it from the application. The funny thing is that I was only supposed to
produce files, not parse them. But I started by doing a parser and
thanks to a declarative approach, when I finished the parser, the
unparser was ready almost instantly.


One way to introduce bitfields and varying endianness would be to design
some description language like that in Erlang (some algebraic datatype
should suffice). Then we could generate datatypes and class instances
from it using TH. Something like that (just a sketch):

ipHeaderLayout =
    Record  
	"IPHeader"
	[ BitField "Word8" BE [ (4, Just ("iphHeaderLen", "!Word8"))
			      , (4, Just ("iphVersion", "!Word8"))
			      ]
	, Field "iphTOS"	(Unsigned 1 BE)	"!Word8"
	, Field "iphTotalLen"	(Unsigned 2 BE)	"!Word16"
	, Field "iphID"		(Unsigned 2 BE)	"!Word16"
	, Field "iphFragOff"	(Unsigned 2 BE)	"!Word16"
	, Field "iphTTL"	(Unsigned 1 BE)	"!Word8"
	, Field "iphProtocol"	(Unsigned 1 BE)	"!Word8"
	, Field "iphCheck"	(Unsigned 2 BE)	"!Word16"
	, Field "iphSAddr"	(Unsigned 4 BE)	"!Word32"
	, Field "iphDAddr"	(Unsigned 4 BE)	"!Word32"
	]

$(createDataType ipHeaderLayout)
$(createDecodableClassInstance ipHeaderLayout)
$(createEncodableClassInstance ipHeaderLayout)


Best regards,
Tom

-- 
.signature: Too many levels of symbolic links
-------------- next part --------------
A non-text attachment was scrubbed...
Name: deser.tgz
Type: application/x-gzip
Size: 7747 bytes
Desc: not available
Url : http://haskell.org/pipermail/haskell-cafe/attachments/20040128/bfc31cff/deser.bin


More information about the Haskell-Cafe mailing list