[Haskell-cafe] Parsing binary data question
Michael Oswald
muell_om at gmx.net
Tue Sep 27 11:14:14 CEST 2011
Hello all,
I am currently working on parser for some packets received via the
network. The data structure currently is like that:
data Value = ValUInt8 Int8
| ValUInt16 Int16
| ValUInt32 Int32
-- more datatypes
data Parameter = Parameter {
paramName :: String,
paramValue :: Value
}
| ParameterN {
paramName :: String,
paramValue :: Value
}deriving (Show)
data TCPacket = TCPacket {
tcAPID :: Word16,
tcType :: Word8,
tcSubType :: Word8,
tcParameters :: [Parameter]
}
The output should a parsed packet (I am using cereal for this). The
packet structure can vary depending on the type and the configuration,
so I have a function which takes a TCPacket as "input template" which
has already the correct list of parameters which are then parsed:
parseTCPacket :: Word16 -> Word8 -> Word8 -> ByteString -> TCPacket ->
TCPacket
parseTCPacket apid t st pktData tmplate =
TCPacket apid t st params
where
tmplParams = (tcParameters tmplate)
params = zipWith (\p v -> p {paramValue = v} ) tmplParams values'
values = map paramValue tmplParams
values' = binValues values (pktData pusPkt)
getBinGet :: Value -> Get Value
getBinGet (ValInt8 _) = getWord8 >>= \x -> return $ ValInt8 $ fromIntegral x
getBinGet (ValInt16 _) = getWord16be >>= \x -> return $ ValInt16 $
fromIntegral x
-- many more datatypes
getBinValues :: [Value] -> Get [Value]
getBinValues inp = mapM getBinGet inp
binValues :: [Value] -> ByteString -> ([Value], B.ByteString)
binValues inp bytes = case runGet (getBinValues inp) bytes of
Left err -> throw $ DecodeError ("binValues: "
++ err)
Right x -> x
This works quite well and does what I want. Now I have the problem that
there are some parameters, which could be so-called "group repeaters"
(the ParameterN constructor above). This means, that if such a parameter
N is encountered during parsing (it has to be an int type), all
following parameters are repeated N times with possible nesting.
So for example if the template (where the values are all 0) is like this:
[Parameter "Param1" (ValUInt32 0), ParameterN "N1" (ValUInt8 0),
Parameter "Param2" (ValUint16 0), ParameterN "N2" (ValUint8 0),
Parameter "Param3" (ValUint8 0)]
Which means there is a group for the last 3 parameters which is repeated
N1 times which contains another group which is repeated N2 times.
If binary data based on the template above would be like this (datatypes
omitted):
10, 2, 439, 2, 12, 13, 65535, 2, 22, 23
then a valid packet after parsing would be:
[Parameter "Param1" (ValUint32 10), ParameterN "N1" (ValUint8 2),
Parameter "Param2" (ValUint16 439), ParameterN "N2" (ValUint8 2),
Parameter "Param3" (ValUint8 12), Parameter "Param3" (ValUint8 13),
Parameter "Param2" (ValUint16 65535), ParameterN "N2" (ValUint8 2),
Parameter "Param3" (ValUint8 22), Parameter "Param3" (ValUint8 23)]
Now I am a bit lost on how to implement such a parser. It would be much
easier if the structure would be already encoded in the binary data, but
I have to stick to this template approach. I have some C++ parser which
does this but of course it's very imperative and a little bit quirky
implemented, so if anybody has an idea on how to proceed (cereal,
attoparsec whatever), please tell me.
lg,
Michael
More information about the Haskell-Cafe
mailing list