[Haskell-cafe] Lazy HTML parsing with HXT, HaXML/polyparse, what else?

Henning Thielemann lemming at henning-thielemann.de
Mon May 14 09:36:56 EDT 2007


On Fri, 11 May 2007, Malcolm Wallace wrote:

> > *Text.ParserCombinators.PolyLazy>
> >       runParser (exactly 4 (satisfy Char.isAlpha)) ("abc104"++undefined)
> > ("*** Exception: Parse.satisfy: failed
>
> This output is exactly correct.  You asked for the first four characters
> provided that they were alphabetic, but in fact only the first three
> were alphabetic.  Hence, 'satisfy' failed and threw an exception.  If
> you ask for only the first three characters, then the parse succeeds:

The problem is obviously that a later wrong character can make the whole
parse fail. Thus successful generated data is not returned until the whole
input is parsed and checked. How can I suppress checking the whole input?
How can I tell the parser that everything it parsed so far will not be
invalidated by further input? How can I rewrite the above example that it
returns
  ("abc*** Exception: Parse.satisfy: failed
?

I wondered whether 'commit' helps, but it didn't. (I thought it would
convert a global 'fail' to a local 'error'.)

*Text.ParserCombinators.PolyLazy>
    runParser (exactly 4 (commit (satisfy Char.isAlpha))) ("abc104"++undefined)
*** Exception: Parse.satisfy: failed


More information about the Haskell-Cafe mailing list