[Haskell-cafe] Parsec to parse tree structures?

Stephen Tetley stephen.tetley at gmail.com
Sun Mar 14 15:09:49 EDT 2010


Hi David

Ah ha - this form of binary file layout is quite common (e.g. PECOFF
object files and OpenType / TrueType fonts).

Parsec and other parsing libraries are perhaps not ideal for the task,
as they consume input as they parse. I have my own alternative to
Parsec - Kangaroo [1] - for parsing binary files. It moves a cursor
around inside the file (strictly speaking an array in memory from
reading the file), so you can parse within a sub-region of the file
and jump back out again.

Although its on Hackage, I wouldn't really recommend its use - its now
fairly well documented but the API is not stable and I only work on it
sporadically. Because I didn't want any dependencies, the package is
quite a bit larger than it need be - if someone were interested in
technique they might be better off using it as a start point. The most
important bits are the 'intraparse' function and the monadic machinery
inside the Kangaroo.ParseMonad module.

Even when a binary format has a published standard, unfortunately the
standard might not be detailed enough to actually produce a parser.
This is the case for True Type and PECOFF which I wrote Kangaroo for,
and as I don't have much enthusiasm for deriving a parser from another
open-source implementation, its rather stalling any continued
development of Kangaroo.


[1] http://hackage.haskell.org/package/kangaroo

Best wishes

Stephen


More information about the Haskell-Cafe mailing list