[Haskell-cafe] Using Get monad to efficiently parse byte-stuffed data

Pom pommonico at gmail.com
Thu Mar 25 00:06:21 EDT 2010


On 10-03-24 12:44 PM, Paul Johnson wrote:
> On 24/03/10 04:36, Pom Monico wrote:
>> Hello all,
>>
>> I'm struggling to use the Get monad to efficiently parse the some
>> binary data of the format below. I simply can't find a way to use the
>> monad to parse it efficiently.
>>
>> Binary data is terminated by a 0xFF marker. If the data itself
>> contains an 0xFF, a 0x00 is byte-stuffed after it to indicate it is
>> part of the data. An 0xFF followed by a non-zero byte indicates that
>> it is a real marker and terminates the data and the 0xFF is not part
>> of the data.
> So the only way to find out if you have reached the end of the data is 
> to read beyond it?  Yuck!
>
> I haven't used it myself, but I suggest taking a look at Attoparsec, 
> which is like Parsec but for bytestrings.  That might be able to 
> handle this kind of non-determinism.
Don't know who thought it up but I have to deal with it.

The best I've gotten so far is the code below. In the Get monad, I do a 
"lookAhead getRemainingLazyByteString", call "go s 0 []" to get the 
byte-unstuffed data and #bytes consumed, and then do a "skip consumed" 
to move forward in the Get monad.

It is ugly and slow. I would like to implement this using the Get monad 
without dropping into the ByteString, and also so speed it up. In order 
of most time consuming, the sections are: go, takeToFF, ECS5-4, ECS5-2. 
But I'm also out of ideas without resorting to mutable constructs, 
esoteric tricks, or things that look like it belongs in a .c file.

Pom.

ff00 = {-# SCC "ECS5-8" #-} L.pack [0xff, 0x00]
ff   = {-# SCC "ECS5-9" #-} L.singleton 0xff
takeToFF s = case L.elemIndex 0xff s of
     Nothing -> (s, L.empty, L.length s) -- Return entire string
     Just x  -> let (h, t) = L.splitAt x s in (h, t, x) -- Split before 0xff
go s n acc
     | L.null s            = {-# SCC "ECS5-1" #-} (L.concat $ reverse 
acc, n)
     | L.isPrefixOf ff00 s = {-# SCC "ECS5-2" #-} go (L.drop 2 s) (n+2) 
(ff:acc)
     | L.isPrefixOf ff s   = {-# SCC "ECS5-3" #-} (L.concat $ reverse 
acc, n)
     | otherwise           = {-# SCC "ECS5-4" #-} let (h, t, x) = 
takeToFF s in go t (n+x) (h:acc)



More information about the Haskell-Cafe mailing list