# [Haskell-cafe] A very nontrivial parser

Claus Reinke claus.reinke at talk21.com
Sat Jul 7 20:13:42 EDT 2007

> Now take decodeRLEb and feed it's output to some nontrivial parser, and
> then feed the remainder of the input, unmodified, into another parser:

so the code as posted didn't exhibit a full use case. that specification is
still a bit vague. assuming that p1: decodeRLE, p2: nontrivial parser, and
p3: another parser, it could be interpreted simply as parser combination:

do { output <- p1; x <- p2 output; y <- p3; return (x,y) }

or perhaps you meant to run p2 over the output of p1 in a separate parser
chain, with the remaining input left by p1, not by p2, being fed into p3?

do { output <- p1; Just  x <- return $evalStateT p2 output; y <- p3; return (x,y) } then we'd have something like p2 stack p1 = do { out <- p1; Just x <- return$ evalStateT p2 out; return x }

> Since I don't know how much data other_stuff is going to consume - let
> alone how much of the raw data you have to feed to decodeRLEb to make
> that much data - we arrive at the structure shown.

ah, that suggests yet another specification, a variation of the second
version above, where the parser in control is not p1 itself, but p2, with
p1 acting as an input transformation for p2, and p3 resuming where
p1 left off. the difference being that p2's demand is supposed to drive
p1's input processing. which is a bit of a problem.

parsers are usually data- and grammar-driven, not demand-driven,
ie the input consumed by p1 does not usually depend on the demands
on p1's output. one could let p1 generate results of increasing length,
and let p2 pick a result that fits, but that would involve rerunning p2
on the complete prefix of too-short results, backtracking into p1 until
it produces an output useable by p2 - not exactly elegant or efficient,
but it would fit the second variant above (one would have to ensure
that p1 backtracked only over the length of input consumed, eg, an
outermost 'many', and that the shortest alternative was produced first).

looking a little bit more closely, however, p1 is used more as a
piecewise input transformation for p2 than as a separate parser.
so it makes more sense to integrate p1 into p2 (or rather: parts
of p1 - if p1 is 'many group', then we should integrate only 'group';
in other words, we'd like to run p1 repeatedly, in minimal-much
mode, rather than the more typical once, in maximal-munch mode),
so that the latter uses some part of p1 as its item parser (which,
in turn, assumes that p2 has a single, identifiable item parser -
called 'fetch' here, and no other way to access the parse state).

that seems to be what you have in mind with your stacked
approach, where the state is read exclusively through the fetch
method in the Source interface, and a Source can either be a
plain list or buffered item parser stacked on top of a Source
(where fetch is served from the buffer, which is replenished
by running the item parser over the inner Source; btw, are
unused buffer contents discarded when p2 finishes? they
can't be transformed back into p1/p3's input format..).

instead of using a type class, one could also parameterise p2
by its item parser, 'fetch'. that might make it easier to see that
this stacking is a kind of parser composition. unlike the
standard function and monad compositions, this one relies
on the compositional nature of combinator parsers: there's an
item parser, which accesses the input and produces output,
and there is a coordination framework (the combinatorial
grammar) specifying how the item parser is to be used.

function composition allows us to abstract over each part of
the composed function, including the inner function in a 'stack'
of functions:

\x->f (g x)
==> -- abstract over g
(f .)

we can try to view parsers as composed from a grammar
and an item parser, where the latter is the 'inner' part of
this composition:

\s->(item >> item) s mplus item s
==> -- abstract over item
\item s->(item >> item) s mplus item s

turning item/fetch into a type class method is just another
way of composing the grammar with an item parser.

i had to implement it myself to understand what you were
trying to do, and how.. if indeed i have understood?-)

hth,
claus

> (This makes it, what, the 5th time I've explained this? LOL...)

with problem specifications, it isn't quantity that counts.
the more ambiguous the specification, the more likely it
is that replies interpret it in ways that do not answer the
question. the fewer replies seem to address the question,
the more likely it is that the specification needs to be clearer.

on a high-volume list where readers might dip into and out
of long threads at any point, repetition in the form of concise
summaries can be helpful, even to those readers who might
follow every post in every thread.



More information about the Haskell-Cafe mailing list