[Haskell-cafe] Lazy HTML parsing with HXT, HaXML/polyparse,
what else?
Henning Thielemann
lemming at henning-thielemann.de
Mon May 14 09:36:56 EDT 2007
On Fri, 11 May 2007, Malcolm Wallace wrote:
> > *Text.ParserCombinators.PolyLazy>
> > runParser (exactly 4 (satisfy Char.isAlpha)) ("abc104"++undefined)
> > ("*** Exception: Parse.satisfy: failed
>
> This output is exactly correct. You asked for the first four characters
> provided that they were alphabetic, but in fact only the first three
> were alphabetic. Hence, 'satisfy' failed and threw an exception. If
> you ask for only the first three characters, then the parse succeeds:
The problem is obviously that a later wrong character can make the whole
parse fail. Thus successful generated data is not returned until the whole
input is parsed and checked. How can I suppress checking the whole input?
How can I tell the parser that everything it parsed so far will not be
invalidated by further input? How can I rewrite the above example that it
returns
("abc*** Exception: Parse.satisfy: failed
?
I wondered whether 'commit' helps, but it didn't. (I thought it would
convert a global 'fail' to a local 'error'.)
*Text.ParserCombinators.PolyLazy>
runParser (exactly 4 (commit (satisfy Char.isAlpha))) ("abc104"++undefined)
*** Exception: Parse.satisfy: failed
More information about the Haskell-Cafe
mailing list