[Haskell-cafe] attoparsec and backtracking

Sat Mar 16 02:54:08 CET 2013

Evan Laforge wrote:

> The first is that it's hard to get the right error msg out.  For
> instance, I have a parser that tries to parse a number with an
> optional type suffix.  It's an error if the suffix is unrecognized:
> 
> p_num :: A.Parser Score.TypedVal
> p_num = do
>     num <- p_untyped_num
>     suffix <- A.option "" ((:"") <$> A.letter_ascii)
>     case Score.code_to_type suffix of
>         Nothing -> fail $ "p_num expected suffix in [cdsr]: " ++ show suffix
>         Just typ -> return $ Score.Typed typ num

I think the mistake here is to parse something and then decide if
its it valid. It should be the parser which decides whether its
valid. So rather than:

     suffix <- A.option "" ((:"") <$> A.letter_ascii)

try:

     typ <- A.choice [ {- list or valid suffix parsers -} ]
     return $ Score.Typed typ num

> However, which error msg shows up depends on the order of the (<|>)
> alternatives, and in general the global structure of the entire
> parser, because I think it just backtracks and then picks the last
> failing backtrack.

I'm not sure if what I've offered will help, but its worth a try.

> Even after carefully rearranging all the parsers
> it seems impossible to get this particular error to bubble up to the
> top.

Yes, I've found it impossible to force attoparsec to fail a parse.
I think that is intended as a feature.

> The thing is, as soon as I see an unexpected suffix I know I can
> fail entirely right there, with exactly that error msg, but since
> there's no way to turn off backtracking I think there's no way to do
> that.

Yes, that's my impression.

<snip>

> I
> originally used parsec, but parsing speed is my main bottleneck, so I
> don't want to give ground there.

We you using Parsec as a token parser or as a Char parser. Obviously
the second is going to be slow in comparison to the first.

> I've heard some good things about traditional alex+happy...
> of course it would mean a complete rewrite but might be interesting.

I've user Alex with both Parsec and Happy, where speed was strong
secondary goal. Personally I much prefer Parsec; IMO its easier to debug
and extend and modify that Happy based parsers. I also know some people
prefer Happy.

> Has anyone compared the performance of attoparsec vs. alex+happy?

I haven't, nor have I compared those two with alex+parsec. It would
be an interesting experiment.

Erik
-- 
----------------------------------------------------------------------
Erik de Castro Lopo
http://www.mega-nerd.com/