[Haskell] Re: ANNOUNCE: CC-delcont-0.1; Delimited continuations for Haskell

Sun Jul 15 06:57:23 EDT 2007

This is intended as an illustration of how one might use the CC-delcont
library, and/or what it might be good for at all. For more, a mailing
list search for 'oleg' would likely be fruitful, as this library is
heavily derived from the delimited continuation implementation he's
used in past mails.

Essentially, this is a port of Oleg's 'Incremental, undoable parsing'
as seen on the OCaml mailing list. It does lack some of the properties
of the OCaml implementation, however (which I'll talk more about later).
The original mail is available here:

http://caml.inria.fr/pub/ml-archives/caml-list/2007/07/7a34650001bf6876b71c7b1060ac501f.en.html

This message is (hopefully) literate haskell, and should be able to be
run directly, as long as the CC-delcont library is installed.

> {-# OPTIONS_GHC -fglasgow-exts #-}

> module Parse where

> import Text.ParserCombinators.Parsec

> import Control.Monad
> import Control.Monad.Trans
> import Control.Monad.CC
> import Control.Monad.CC.Dynvar

------------
The Inverter
------------

First comes a datatype that is, in some sense, a reification of the act
of parsing. Done, of course, means that the parsing is over, and it
holds the result of parsing. ReqChar means that the parser is paused,
waiting for more input. It holds a position, indicating how many
characters it's read so far.

> data StreamReq m a where
>     Done    :: Monad m => a -> StreamReq m a
>     ReqChar :: Monad m => Int -> (Maybe Char -> m (StreamReq m a)) -> 
StreamReq m a

> fromDone :: StreamReq m a -> a
> fromDone (Done a) = a

> instance Show (StreamReq m a) where
>     show (Done _)      = "Done"
>     show (ReqChar p _) = "Position: " ++ show p

And now comes the magic. toList below takes a function that, given a
position in a stream (list), returns the element that should be at that
position, and builds such a stream. The important part is that this
function works in the context of an arbitrary monad. streamInv is such
a "position -> element" function, but when asked for a character, it
captures 'the rest of the stream production,' and reifies it into a
StreamReq using delimited continuations. This is what allows one to,
in essence, get a pointer into 'the act of parsing,' and pass such
things around, and even have multiple pointers in play at once.

> streamInv :: MonadDelimitedCont p s m =>
>        p (StreamReq m a) -> Int -> m (Maybe Char) 
> streamInv p pos = shift p (\sk -> return $ ReqChar pos (sk . return))

> toList :: Monad m => (Int -> m (Maybe a)) -> m [a]
> toList f = toList' 0
>  where
>  toList' n = f n >>= maybe (return []) (\c -> liftM (c:) $ toList' (n+1))

The following three functions operate on the resumable parsers, either
providing input, or telling the parser that there is no more input.
Hopefully they're fairly straight forward.

> provide :: Monad m => Char -> StreamReq m a -> m (StreamReq m a)
> provide _ d@(Done _)    = return d
> provide c (ReqChar _ f) = f $ Just c

> finish :: Monad m => StreamReq m a -> m (StreamReq m a)
> finish d@(Done _)    = return d
> finish (ReqChar _ f) = f Nothing

> provideSome :: Monad m => String -> StreamReq m a -> m (StreamReq m a)
> provideSome [] s     = return s
> provideSome (x:xs) s = provide x s >>= provideSome xs

The following is sort of a wrapper. It turns a parsing function into
a resumable parser. As can be seen, the parser function is pure; you
can turn any existing parse function into a resumable parser in this
manner. In other words, there's no need to worry about having to go
through all your code, putting in delimited continuation constraints
on all your functions; no "monad pollution" is involved.

> invertParse :: (MonadIO m, MonadDelimitedCont p s m) =>
>       (String -> a) -> m (StreamReq m a)
> invertParse parser = reset $ \p -> (Done . parser)
>                                `liftM` toList (streamInv p)  

----------
The parser
----------

This is the parser that will be inverted in the example. It reads a
sum of several numbers; something like:

    1 + 23 + 46 + 59 + 102

and returns the list of numbers to be summed (as an [Integer]).

As can be seen, this is just an ordinary Parsec parser.

> plus    = char '+' >> spaces >> return (++)
> number  = many1 digit >>= \n -> spaces >> return (read n :: Integer)
> total p = p >>= \a -> getInput >>= guard . null >> return a
> sump    = total $ chainl1 (liftM return number) plus

-----
Usage
-----

This portion finally makes use of the above stream inversion to
interesting effect (I hope). It will repeatedly ask for input
to parse, gradually feeding that input into the parser, until
a blank line is provided, at which point it assumes the user is
finished.

However, at each user input, the program stops to see if the
input up to that point would have been a successful parse. If so,
it saves that partial parser. If, when the user signals the end
of input, the parse fails, the program will resume taking input
from the last successful parse.

The saving of the successful parses is achieved through the use
of the dynamically scoped variables also included in this library.
They can, essentially, be viewed as more flexible versions of
the reader or state monads (depending on whether you use the
set operations or not) that can be used within the context of
delimited continuation monads.

> prompt q p = do liftIO $ print p
>                 liftIO $ putStrLn "(Empty line to stop)"
>                 liftIO $ putStr ": "
>                 l <- liftIO getLine
>                 if null l
>                    then finish p
>                    else saveValid q p >> provideSome l p >>= prompt q

> saveValid q p = do p' <- finish p
>                    case p' of
>                         Done (Right _) -> dset q p >> return ()
>                         _              -> return ()

> check _ p@(Done (Right _)) = return p
> check q p = do liftIO $ putStrLn "Invalid parse."
>                liftIO $ putStrLn "Restarting from last good parse."
>                dupp q (prompt q) >>= check q

> main :: IO ()
> main = runCCT $ do q <- dnew
>                    p <- invertParse (parse sump "Equation")
>                    dlet q p (prompt q p >>= check q
>                                >>= liftIO . print . fromDone) 

And now for the caveats... If one does some experimentation, he
will find that things don't work exactly the same as the OCaml
version. For instance, if one tries to emulate the OCaml example
exactly, by doing something like (unchecked; I think the code is
well-formed):

  reset (\p -> toList (streamInv p) >>= return . lexer
            >>= printTokens >> return (Done ()))

he'll find that tokens are only printed once a given resumable
parser is closed (which may, of course, happen multiple times),
and that if he closes multiple parsers from the same source,
tokens will be printed for each parser, even over the shared
portions. I believe these have the following causes:

1) The delay is due to being highly-sequentialized. Before the
lexer and printTokens can do anything, toList has to get the
entire list to pass along the monadic pipeline. It can't be
produced lazily, because the pieces are being pulled out of
monadic computations, which have to be sequenced.

2) The duplication is (possibly) a related matter. One way to solve
#1 (I think) would be to parse over a list which incorporated monads.
So a cons would have a value at head, and as a tail, a monadic action
to produce the rest of the list. If this were used, only enough monadic
actions as are needed to produce the output would be performed. If the
output of the lexer were produced similarly, the printing could execute
in a lazy fashion, and, possibly, the branching caused by the continuation
monad could be limited to only the portions of the parse that aren't
duplicated. For an example of this sort of idea explored, see "ListT
done right" on the wiki:

   http://haskell.org/haskellwiki/ListT_done_right

However, such a solution involves a lot of the monad creep that people
tend not to like, and which the above code avoids.

The duplication of the output of a printTokens function isn't a big deal;
after all, in most cases, one would want to get whole results out of
alternate parses, *and then* inspect/print them. However, the over-strictness
can be a pain. In fact, it has a direct consequence for the above code,
in that parse errors are never detected until an inverted parser is closed.
With a lazy list input, the following is well defined:

   parse sump "Sum" ("1 + 2 - 3" ++ undefined)

The result will be a parse error, as the '-' is immediately recognized as
a problem. However, if one feeds "1 + 2 - 3" into a resumable parser, it
will not recognize that the parse cannot complete, and advance to Done.
Instead, it waits until the parser is closed, at which point, it can
produce the input list, and get to the parse error. This is one reason
that the code above periodically closes the parser, checks for a good
parse, and saves the unclosed version if it finds one. However, there are
parsers where prefixes of correct input will not themselves parse
(consider a parser for something like "1 + 2 + 3 + 4 = 10"; without
an '=' and right-hand side of the equation, it's not a correct parse,
though it may be a prefix to one), and the above method won't be able to
restart as closely to an error as one that, say, saved after each input
but resumed when legitimately bad input automatically kicked the
parser to Done.

Whether this is enough motivation to push monads further into one's data
structures, or drop Haskell and go with implicitly monadic code everywhere
in OCaml is left as an exercise to the reader. :)

Anyhow, I hope I've demonstrated something of interest, and given you
a taste of what you can do with this library, and how to do it. There is
some more explanation and some elementary examples in the haddock, although
due to the code's use of some exotic GHC-isms, haddock.ghc, at the least,
will be needed to generate the documentation, and I have yet to get that
fully working here, so there may still be bugs in the haddock.

Cheers,
Dan Doel