[Haskell-beginners] parser comparison question
daniel.is.fischer at web.de
Sat Apr 25 04:34:29 EDT 2009
Am Samstag 25 April 2009 04:31:29 schrieb Walck, Scott:
> I've been comparing Graham Hutton's simple parser with Parsec. Here is
> some example code.
> -- Comparison of Hutton's simple parser with Parsec
> -- Hutton's simple parser is available at
> -- http://www.cs.nott.ac.uk/~gmh/Parsing.lhs
> -- Hutton simple parser called H
> -- Parsec called P
> import qualified Parsing as H
> import qualified Text.ParserCombinators.Parsec as P
> exH = H.parse (H.string "hi") "hiho"
> exP = P.parse (P.string "hi") "" "hiho"
> charH = H.parse (H.char 'a' H.+++ H.char 'b') "bbb"
> charP = P.parse (P.char 'a' P.<|> P.char 'b') "" "bbb"
> choiceH = H.parse (H.string "hoo" H.+++ H.string "ho") "hono"
> choiceP1 = P.parse (P.string "bb" P.<|> P.string "ba") "" "bbb"
> choiceP2 = P.parse (P.string "ba" P.<|> P.string "bb") "" "bbb"
> choiceP22 = P.parse (P.try (P.string "ba") P.<|> P.string "bb") "" "bbc"
> I am interested if anyone could comment on the design of the Parsec 'try'
> function. For example, choiceP2 fails and returns Left error, while
> choiceP22 succeeds.
> Hutton's simple parser doesn't need try. It seems so simple and elegant.
> I'm wondering why Parsec requires me to use 'try' for a string parse that
> might fail.
In short: efficiency
The simple parser is a fully backtracking parser, therefore it has to keep the whole input
available in case a parse fails and an alternative has to be tried.
Parsec's alternative (<|>) tries the second parser only if the first parser failed
*without consuming any input*, so the input consumed so far can be immediately discarded,
which is more efficient. To get backtracking behaviour you must wrap the first parser in a
'try', which makes it either succeed or fail without consuming any input.
Using try means keeping more of the input available, which has a performance cost, so use
try sparingly, try to write your parsers so that they either succeed or fail (almost)
immediately (P.try (P.string "ba") requires only a short part of the input kept available,
so it doesn't hurt performance measurably, but imagine you have a parser that can fail
after having consumed 500MB of input. Keeping that around to try the second alternative on
as the simple parser must do will hurt.)
> Scott N. Walck
> Associate Professor of Physics
> Lebanon Valley College
More information about the Beginners