[Haskell-cafe] A very nontrivial parser

Sat Jul 7 20:13:42 EDT 2007

> Now take decodeRLEb and feed it's output to some nontrivial parser, and 
> then feed the remainder of the input, unmodified, into another parser:

so the code as posted didn't exhibit a full use case. that specification is 
still a bit vague. assuming that p1: decodeRLE, p2: nontrivial parser, and 
p3: another parser, it could be interpreted simply as parser combination:

    do { output <- p1; x <- p2 output; y <- p3; return (x,y) }

or perhaps you meant to run p2 over the output of p1 in a separate parser 
chain, with the remaining input left by p1, not by p2, being fed into p3?

    do { output <- p1; Just  x <- return $ evalStateT p2 output; y <- p3; return (x,y) }

then we'd have something like

    p2 `stack` p1 = do { out <- p1; Just x <- return $ evalStateT p2 out; return x }

> Since I don't know how much data other_stuff is going to consume - let 
> alone how much of the raw data you have to feed to decodeRLEb to make 
> that much data - we arrive at the structure shown.

ah, that suggests yet another specification, a variation of the second 
version above, where the parser in control is not p1 itself, but p2, with 
p1 acting as an input transformation for p2, and p3 resuming where 
p1 left off. the difference being that p2's demand is supposed to drive
p1's input processing. which is a bit of a problem.

parsers are usually data- and grammar-driven, not demand-driven,
ie the input consumed by p1 does not usually depend on the demands
on p1's output. one could let p1 generate results of increasing length, 
and let p2 pick a result that fits, but that would involve rerunning p2 
on the complete prefix of too-short results, backtracking into p1 until 
it produces an output useable by p2 - not exactly elegant or efficient, 
but it would fit the second variant above (one would have to ensure
that p1 backtracked only over the length of input consumed, eg, an
outermost 'many', and that the shortest alternative was produced first). 

looking a little bit more closely, however, p1 is used more as a 
piecewise input transformation for p2 than as a separate parser. 
so it makes more sense to integrate p1 into p2 (or rather: parts 
of p1 - if p1 is 'many group', then we should integrate only 'group'; 
in other words, we'd like to run p1 repeatedly, in minimal-much 
mode, rather than the more typical once, in maximal-munch mode), 
so that the latter uses some part of p1 as its item parser (which, 
in turn, assumes that p2 has a single, identifiable item parser - 
called 'fetch' here, and no other way to access the parse state). 

that seems to be what you have in mind with your stacked
approach, where the state is read exclusively through the fetch
method in the Source interface, and a Source can either be a
plain list or buffered item parser stacked on top of a Source
(where fetch is served from the buffer, which is replenished 
by running the item parser over the inner Source; btw, are 
unused buffer contents discarded when p2 finishes? they 
can't be transformed back into p1/p3's input format..).

instead of using a type class, one could also parameterise p2
by its item parser, 'fetch'. that might make it easier to see that
this stacking is a kind of parser composition. unlike the 
standard function and monad compositions, this one relies 
on the compositional nature of combinator parsers: there's an 
item parser, which accesses the input and produces output, 
and there is a coordination framework (the combinatorial 
grammar) specifying how the item parser is to be used. 

function composition allows us to abstract over each part of
the composed function, including the inner function in a 'stack'
of functions:

    \x->f (g x) 
    ==> -- abstract over g
    (f .)

we can try to view parsers as composed from a grammar
and an item parser, where the latter is the 'inner' part of
this composition: 

    \s->(item >> item) s `mplus` item s 
    ==> -- abstract over item
    \item s->(item >> item) s `mplus` item s

turning item/fetch into a type class method is just another
way of composing the grammar with an item parser.

i had to implement it myself to understand what you were
trying to do, and how.. if indeed i have understood?-)

hth,
claus

> (This makes it, what, the 5th time I've explained this? LOL...)

with problem specifications, it isn't quantity that counts.
the more ambiguous the specification, the more likely it
is that replies interpret it in ways that do not answer the
question. the fewer replies seem to address the question,
the more likely it is that the specification needs to be clearer.

on a high-volume list where readers might dip into and out
of long threads at any point, repetition in the form of concise
summaries can be helpful, even to those readers who might 
follow every post in every thread.