[Haskell-cafe] Lazy HTML parsing with HXT, HaXML/polyparse, what else?

Henning Thielemann lemming at henning-thielemann.de
Mon May 21 09:09:25 EDT 2007


On Mon, 14 May 2007, Malcolm Wallace wrote:

> Henning Thielemann <lemming at henning-thielemann.de> wrote:
>
> > > > *Text.ParserCombinators.PolyLazy>
> > > >       runParser (exactly 4 (satisfy Char.isAlpha))
> > > >       ("abc104"++undefined)
> > > > ("*** Exception: Parse.satisfy: failed
> >
> > How can I rewrite the above example that it returns
> >   ("abc*** Exception: Parse.satisfy: failed
>
> The problem in your example is that the 'exactly' combinator forces
> parsing of 'n' items before returning any of them.  If you roll your
> own, then you can return partial results:
>
>     > let one = return (:) `apply` satisfy (Char.isAlpha)
>       in runParser (one `apply` (one `apply`
>                    (one `apply` (one `apply` return []))))
>              ("abc104"++undefined)
>     ("abc*** Exception: Parse.satisfy: failed
>
> Equivalently:
>
>     > let one f = ((return (:)) `apply` satisfy (Char.isAlpha)) `apply` f
>       in runParser (one (one (one (one (return []))))) ("abc104"++undefined)
>     ("abc*** Exception: Parse.satisfy: failed

 I wonder whether 'apply' merges two separate ideas: Applying a generated
function to some parser generated value and forcing some parser to always
succeed. From the documentation of 'apply' I assumed that 'apply f x'
fails if 'f' or 'x' fails. In contrast to that it seems to succeed if only
'f' succeeds. Wouldn't it be better to have an explicit 'force' which
declares a parser to never fail - and to return 'undefined' if this
assumption is wrong.
 I have seen this 'force' in the MIDI loader of Haskore:
  http://darcs.haskell.org/haskore/src/Haskore/General/Parser.hs

 It would hold:
   apply f x  ==  do g <- f; fmap g (force x)


More information about the Haskell-Cafe mailing list