[Haskell-cafe] data.binary get reading beyond end of input
greg at gregorycollins.net
Wed Jul 28 10:32:16 EDT 2010
Conrad Parker <conrad at metadecks.org> writes:
> I am reading data from a file as strict bytestrings and processing
> them in an iteratee. As the parsing code uses Data.Binary, the
> strict bytestrings are then converted to lazy bytestrings (using
> fromWrap which Gregory Collins posted here in January:
> -- | wrapped bytestring -> lazy bytestring
> fromWrap :: I.WrappedByteString Word8 -> L.ByteString
> fromWrap = L.fromChunks . (:) . I.unWrap
This just makes a 1-chunk lazy bytestring:
(L.fromChunks . (:)) :: S.ByteString -> L.ByteString
> ). The parsing is then done with the library function
> -- | Run the Get monad applies a 'get'-based parser on the input
> -- ByteString. Additional to the result of get it returns the number of
> -- consumed bytes and the rest of the input.
> runGetState :: Get a -> L.ByteString -> Int64 -> (a, L.ByteString, Int64)
> The issue I am seeing is that runGetState consumes more bytes than the
> length of the input bytestring, while reporting an
> apparently successful get (ie. it does not call error/fail). I was
> able to work around this by checking if the bytes consumed > input
> length, and if so to ignore the result of get and simply prepend the
> input bytestring to the next chunk in the continuation.
Something smells fishy here. I have a hard time believing that binary is
reading more input than is available? Could you post more code please?
> However I am curious as to why this apparent lack of bounds checking
> happens. My guess is that Get does not check the length of the input
> bytestring, perhaps to avoid forcing lazy bytestring inputs; does that
> make sense?
> Would a better long-term solution be to use a strict-bytestring binary
> parser (like cereal)? So far I've avoided that as there is
> not yet a corresponding ieee754 parser.
If you're using iteratees you could try attoparsec + attoparsec-iteratee
which would be a more natural way to bolt parsers together. The
attoparsec-iteratee package exports:
parserToIteratee :: (Monad m) =>
-> IterateeG WrappedByteString Word8 m a
Attoparsec is an incremental parser so this technique allows you to
parse a stream in constant space (i.e. without necessarily having to
retain all of the input). It also hides the details of the annoying
buffering/bytestring twiddling you would be forced to do otherwise.
Gregory Collins <greg at gregorycollins.net>
More information about the Haskell-Cafe