A fancier Get monad or two (a la binary and binary-strict)
Chris Kuklewicz
haskell at list.mightyreason.com
Wed Jul 30 17:34:19 EDT 2008
Johan Tibell wrote:>
> I've written what I believe to be a similar, continuation-based
> parser. I haven't uploaded my latest patches (basically faster
> combinators) but the idea can be seen in the file here:
>
> http://www.johantibell.com/cgi-bin/gitweb.cgi?p=hyena.git;a=blob;f=Hyena/Parser.hs;h=8086b11bfeb3bca15bfd16ec9c6a4b34aadf528e;hb=HEAD
>
> The use case is parsing HTTP without resorting to lazy I/O.
I have just read through your code. It is quite similar.
The differences:
Your error handling is via Alternative, and if the first branch advances
(consumes input) then the second branch is not attempted. The state can only go
forward (by 1 byte) or remain in place (there is no look-ahead). If reading
past the end, then either (*) the new first byte works (*) the new first fails
and the Alternative is ready to try it.
In MyGet/MyGetW/MyGetSimplified the MonadError semantics are different. If
the first alternative fails then the parser state is rolled back and the second
alternative is tried from the same starting point as the first was tried. If
the first alternative trigged more input from IPartial then this input is still
visible to the second alternative.
The management of saved state on the stack of pending operations is much
simpler with your commit-if-advance semantics and much more complicated with my
rollback semantics. Oddly, it seems your committed operations do not
immediately release the pending handlers so they can be garbage collected. This
same kind of issue motivated me to improve the implementation of binary-strict's
incremental get.
On a different note:
Hmmm....In the MyGetW implementation I could add a fancier
throwError/Alternative command that allows the user to "commit" to the current
branch and immediately release/discard the pending handler/second branch.
Something like:
> d = mplus (mplus a b) c where
> a = do comThing <- getCommitter
> x <- getWord32be
> catchError (commitTo comThing >> throwError "WTF")
> (\errMsg -> liftIO (print errMsg))
> b = liftIO (print "b")
> c = liftIO (print "c")
In the above pseudo code the "commitTo" will cause the throwError to bypass both
the (\errMsg -> ...) handler and the "b" alternative. It will go to "c" instead.
The "comThing" is an opaque value that is the unique ID of the current error
handler frame, in the above case it is the frame for "mplus a b". So the
commitTo causes the system to immediately abandon (and allow garbage collection)
of the (errMsg -> ..) and "b" code.
Hmmm....
--
Chris
More information about the Libraries
mailing list