[Haskell-cafe] Parsing binary 'hierachical' objects for lazy developers

John Obbele john.obbele at gmail.com
Sat Apr 30 14:07:53 CEST 2011


On Wed, Apr 27, 2011 at 10:18:47PM +0100, Stephen Tetley wrote:
> On 27 April 2011 21:28, Alexander Solla <alex.solla at gmail.com> wrote:
> > On Wed, Apr 27, 2011 at 11:16 AM, John Obbele <john.obbele at gmail.com> wrote:
> >>
> >> Second issue, I would like to find a way to dispatch parsers. I'm
> >> not very good at expressing my problem in english, so I will use
> >> another code example:
> >
> > This sounds very hard in the general case.  Others have shown you how to
> > dispatch on two types.  But there is no general data type which combines all
> > (or even arbitrarily many) types.  Somehow, "Read" is able to do this, but I
> > don't know what kind of magic it uses.
>
> Read always "demands its type" so it doesn't use any magic - if the
> input string doesn't conform it will throw an error.
> 
> Any sensible binary format will have a scheme such as tag byte
> prefixes to control choice in parsing (binary parsing generally avoids
> all backtracking). If your binary data doesn't have a proper scheme it
> will be hard to parse for any language (or cast-to in the case of C),
> so the most sensible answer is to revise the format.


Oki, so far the use of the Control.Applicative magic, the syntax
sugar for monadic operations and manually written it/then/else or
'case of' branching statements have helped me considerably in the
parsing task.

I have not try DrIFT since I prefer to avoid pre-processors for
now.

So the only quirk that is still upsetting me is the 'deriving'
issue: if I know that what I am parsing could only result in
ObjectA or ObjectB, every thing would be simple.

But when someone decides to add an extension to the binary
format, let's say add a new tag identifier and a new ObjectC with
a different size and new attributes, I will have to re-write part
of my Haskell parser.

I think, I will just have to rewrite the abstract type to
'data AbstractObject = ObjectA | ObjectB | ObjectC'
let my parser still have the type signature:
'parser :: B.ByteString -> AbstractObject'
and modify the branching inside to add the C tag identifier:
'case tag identifier of
    A -> ...
    B -> ...
    C -> ...'.

It's not straightforward but it should be manageable.

regards,
/john



More information about the Haskell-Cafe mailing list