[Haskell-cafe] Using Get monad to efficiently parse byte-stuffed
data
Pom
pommonico at gmail.com
Thu Mar 25 00:06:21 EDT 2010
On 10-03-24 12:44 PM, Paul Johnson wrote:
> On 24/03/10 04:36, Pom Monico wrote:
>> Hello all,
>>
>> I'm struggling to use the Get monad to efficiently parse the some
>> binary data of the format below. I simply can't find a way to use the
>> monad to parse it efficiently.
>>
>> Binary data is terminated by a 0xFF marker. If the data itself
>> contains an 0xFF, a 0x00 is byte-stuffed after it to indicate it is
>> part of the data. An 0xFF followed by a non-zero byte indicates that
>> it is a real marker and terminates the data and the 0xFF is not part
>> of the data.
> So the only way to find out if you have reached the end of the data is
> to read beyond it? Yuck!
>
> I haven't used it myself, but I suggest taking a look at Attoparsec,
> which is like Parsec but for bytestrings. That might be able to
> handle this kind of non-determinism.
Don't know who thought it up but I have to deal with it.
The best I've gotten so far is the code below. In the Get monad, I do a
"lookAhead getRemainingLazyByteString", call "go s 0 []" to get the
byte-unstuffed data and #bytes consumed, and then do a "skip consumed"
to move forward in the Get monad.
It is ugly and slow. I would like to implement this using the Get monad
without dropping into the ByteString, and also so speed it up. In order
of most time consuming, the sections are: go, takeToFF, ECS5-4, ECS5-2.
But I'm also out of ideas without resorting to mutable constructs,
esoteric tricks, or things that look like it belongs in a .c file.
Pom.
ff00 = {-# SCC "ECS5-8" #-} L.pack [0xff, 0x00]
ff = {-# SCC "ECS5-9" #-} L.singleton 0xff
takeToFF s = case L.elemIndex 0xff s of
Nothing -> (s, L.empty, L.length s) -- Return entire string
Just x -> let (h, t) = L.splitAt x s in (h, t, x) -- Split before 0xff
go s n acc
| L.null s = {-# SCC "ECS5-1" #-} (L.concat $ reverse
acc, n)
| L.isPrefixOf ff00 s = {-# SCC "ECS5-2" #-} go (L.drop 2 s) (n+2)
(ff:acc)
| L.isPrefixOf ff s = {-# SCC "ECS5-3" #-} (L.concat $ reverse
acc, n)
| otherwise = {-# SCC "ECS5-4" #-} let (h, t, x) =
takeToFF s in go t (n+x) (h:acc)
More information about the Haskell-Cafe
mailing list