A fancier Get monad or two (a la binary and binary-strict)

Chris Kuklewicz haskell at list.mightyreason.com
Wed Jul 30 17:34:19 EDT 2008


Johan Tibell wrote:>
> I've written what I believe to be a similar, continuation-based
> parser. I haven't uploaded my latest patches (basically faster
> combinators) but the idea can be seen in the file here:
> 
> http://www.johantibell.com/cgi-bin/gitweb.cgi?p=hyena.git;a=blob;f=Hyena/Parser.hs;h=8086b11bfeb3bca15bfd16ec9c6a4b34aadf528e;hb=HEAD
> 
> The use case is parsing HTTP without resorting to lazy I/O.

I have just read through your code.  It is quite similar.

The differences:
   Your error handling is via Alternative, and if the first branch advances 
(consumes input) then the second branch is not attempted.  The state can only go 
forward (by 1 byte) or remain in place (there is no look-ahead).  If reading 
past the end, then either (*) the new first byte works (*) the new first fails 
and the Alternative is ready to try it.

   In MyGet/MyGetW/MyGetSimplified the MonadError semantics are different.  If 
the first alternative fails then the parser state is rolled back and the second 
alternative is tried from the same starting point as the first was tried.  If 
the first alternative trigged more input from IPartial then this input is still 
visible to the second alternative.

   The management of saved state on the stack of pending operations is much 
simpler with your commit-if-advance semantics and much more complicated with my 
rollback semantics.  Oddly, it seems your committed operations do not 
immediately release the pending handlers so they can be garbage collected.  This 
same kind of issue motivated me to improve the implementation of binary-strict's 
incremental get.

On a different note:

Hmmm....In the MyGetW implementation I could add a fancier 
throwError/Alternative command that allows the user to "commit" to the current 
branch and immediately release/discard the pending handler/second branch. 
Something like:

> d = mplus (mplus a b) c where
>   a = do comThing <- getCommitter
>          x <- getWord32be
>          catchError (commitTo comThing >> throwError "WTF")
>                     (\errMsg -> liftIO (print errMsg))
>   b = liftIO (print "b")
>   c = liftIO (print "c")

In the above pseudo code the "commitTo" will cause the throwError to bypass both 
the (\errMsg -> ...) handler and the "b" alternative.  It will go to "c" instead.

The "comThing" is an opaque value that is the unique ID of the current error 
handler frame, in the above case it is the frame for "mplus a b".  So the 
commitTo causes the system to immediately abandon (and allow garbage collection) 
of the (errMsg -> ..) and "b" code.

Hmmm....

-- 
Chris


More information about the Libraries mailing list