[Haskell-cafe] Parsing binary data question

Michael Oswald muell_om at gmx.net
Tue Sep 27 11:14:14 CEST 2011


Hello all,

I am currently working on parser for some packets received via the 
network. The data structure currently is like that:


data Value = ValUInt8 Int8
            | ValUInt16 Int16
            | ValUInt32 Int32
	-- more datatypes

data Parameter = Parameter {
   paramName :: String,
   paramValue :: Value
   }
   | ParameterN {
   paramName :: String,
   paramValue :: Value
   }deriving (Show)

data TCPacket = TCPacket {
   tcAPID :: Word16,
   tcType :: Word8,
   tcSubType :: Word8,
   tcParameters :: [Parameter]
   }

The output should a parsed packet (I am using cereal for this). The 
packet structure can vary depending on the type and the configuration, 
so I have a function which takes a TCPacket as "input template" which 
has already the correct list of parameters which are then parsed:

parseTCPacket :: Word16 -> Word8 -> Word8 -> ByteString -> TCPacket -> 
TCPacket
parseTCPacket apid t st pktData tmplate =
     TCPacket apid t st params
     where
         tmplParams = (tcParameters tmplate)
         params = zipWith (\p v -> p {paramValue = v} ) tmplParams values'
         values = map paramValue tmplParams
         values' = binValues values (pktData pusPkt)

getBinGet :: Value -> Get Value
getBinGet (ValInt8 _) = getWord8 >>= \x -> return $ ValInt8 $ fromIntegral x
getBinGet (ValInt16 _) = getWord16be >>= \x -> return $ ValInt16 $ 
fromIntegral x
-- many more datatypes

getBinValues :: [Value] -> Get [Value]
getBinValues inp = mapM getBinGet inp


binValues :: [Value] -> ByteString -> ([Value], B.ByteString)
binValues inp bytes = case runGet (getBinValues inp) bytes of
                         Left err -> throw $ DecodeError ("binValues: " 
++ err)
                         Right x -> x


This works quite well and does what I want. Now I have the problem that 
there are some parameters, which could be so-called "group repeaters" 
(the ParameterN constructor above). This means, that if such a parameter 
N is encountered during parsing (it has to be an int type), all 
following parameters are repeated N times with possible nesting.

So for example if the template (where the values are all 0) is like this:
[Parameter "Param1" (ValUInt32 0), ParameterN "N1" (ValUInt8 0), 
Parameter "Param2" (ValUint16 0), ParameterN "N2" (ValUint8 0),
Parameter "Param3" (ValUint8 0)]

Which means there is a group for the last 3 parameters which is repeated 
N1 times which contains another group which is repeated N2 times.
If binary data based on the template above would be like this (datatypes 
omitted):

10, 2, 439, 2, 12, 13, 65535, 2, 22, 23

then a valid packet after parsing would be:

[Parameter "Param1" (ValUint32 10), ParameterN "N1" (ValUint8 2), 
Parameter "Param2" (ValUint16 439), ParameterN "N2" (ValUint8 2),
Parameter "Param3" (ValUint8 12), Parameter "Param3" (ValUint8 13),
Parameter "Param2" (ValUint16 65535), ParameterN "N2" (ValUint8 2),
Parameter "Param3" (ValUint8 22), Parameter "Param3" (ValUint8 23)]

Now I am a bit lost on how to implement such a parser. It would be much 
easier if the structure would be already encoded in the binary data, but 
I have to stick to this template approach. I have some C++ parser which 
does this but of course it's very imperative and a little bit quirky 
implemented, so if anybody has an idea on how to proceed (cereal, 
attoparsec whatever), please tell me.


lg,
Michael






More information about the Haskell-Cafe mailing list