[Haskell-Cafe] Parsing bytestream

Tue Nov 9 05:09:38 EST 2010

2010/11/9 C K Kashyap <ckkashyap at gmail.com>:
> Thanks Stephen,
>
> On Tue, Nov 9, 2010 at 2:53 PM, Stephen Tetley <stephen.tetley at gmail.com> wrote:
>> I'd use a parser combinator library that has word8 word16, word32
>> combinators. The latter should really have big and little endian
>> versions word16be, word16le, word32be, word32le.
>>
>> Data.Binary should provide this and Attoparsec I think. Usually I roll
>> my own, but only because I had my own libraries before these two
>> existed.
>>
>> The idiom of a tag byte telling you what comes next is very common in
>> binary formats. It means parsers can avoid backtracking altogether.
>
> I'll take a look at attoparsec
>
> I was also trying to understand how I could do it myself also -
>
> Basically I've been using the Get Monad for getting out the word/8/16
> etc out of a ByteStream - but I dont want to write a separate parsing
> routine for each command.
>
> So instead of doing something like this -
>
> parseCommand1 byteStream = runGet $ do
>                                             b1 <- getWord8
>                                             b2 <- getWord16be
>                                             return (b1,b2)
>
>
> parseCommand2 byteStream = runGet $ do
>                                             b1 <- getWord16be
>                                             b2 <- getWord16be
>                                             return (b1,b2)
>
> I'd like to do this
>
> parse byteStream command = runGet $ do
>                                             map (commandFormat
> command) --- or something like this - not exactly sure about this.

Hi,

This doesn't seem a good idea to me. In the first case, when you have
parsed your data, you end up with very specific data structures that
can be processed later as-is. In the second case, you end up with a
list for every kind of data, so you're bound to "parse" that list
again to know what you're dealing with.

In the first case, parsing wrong data is the only way to fail and you
produce solid data you can work with. In the second case, you have a
very weak representation that will need more work afterward, and that
work is very similar to the parsing you do in the first place.

Cheers,
Thu