[Haskell-cafe] [ANN] faster-megaparsec

Fri Nov 11 08:39:06 UTC 2022

> Do you provide reproducible benchmarks for these claims? I'd love to
> see the outcome for various types of projects to make the choice for
> myself.
> 
> Unsubstantiated claims of "This is Faster™️" are the path to shame
> and treachery.
> 

Dear Emily, 

version 0.1.1 [1] now contains a benchmark suite, where words of
randomly generated languages are parsed with both parsers (ParsecT and
StateT Text Maybe). For tiny to moderately sized languages, the speed-
up is not huge but noticeable. (I need to work on improving test case
generation.)

I've been using this trick for a long time but only now decided to spin
off a dedicated package. My experience with parsing large amounts of
tabular data files shows there is a more substantial difference,
although that might be due to my lack of understanding how to write
optimized megaparsec parsers. Still your point is valid that this is
merely anecdotal evidence. 
If time permits, I shall add a more realistic benchmark with randomly
generated CSV data or Finn Bender's haskell-parsing-benchmarks [2]. 

Cheers,
Olaf

[1] https://hackage.haskell.org/package/faster-megaparsec-0.1.1.0
[2] https://gitlab.com/FinnBender/haskell-parsing-benchmarks

> On Tue, Nov 08, 2022 at 1:31 PM, Olaf Klinke < olf at aatal-
> apotheke.de > wrote:
> 
> > 
> > 
> > 
> > Dear Haskellers,
> > 
> > 
> > 
> > megaparsec [1] is a parsing library with excellent error reporting.
> Being
> > able to emit decent parse error messages is traded off against
> speed,
> > however. Wouldn't it be nice to have a faster parser for your daily
> > parsing needs, but obtain megaparsec's informative error messages
> for the
> > rare cases when things go wrong?
> > 
> > 
> > 
> > Thanks to megaparsec's parser type class, MonadParsec, you can have
> both:
> > I wrote a MonadParsec instance for StateT s Maybe (s being the
> input
> > stream type such as Text or ByteString) which does not do all the
> > bookkeeping ParsecT needs to do. If you manage to write your parser
> only
> > using the type class primitives, then you can specialize your
> parser to
> > StateT s Maybe. Only if that returns Nothing specialize again to
> ParsecT
> > Void s and parse the input again, thus obtaining a decent error
> message.
> > The package faster-megaparsec [1,2] provides a convenient function
> to do
> > just that.
> > 
> > 
> > 
> > Of course StateT s Maybe can not provide all the features ParsecT
> has:
> > * StateT s Maybe is always backtracking.
> > * It does not know the offset in the stream.
> > * It must hold the entire input stream in memory for a second pass,
> so no
> > garbage collecting the consumed input while parsing. So if your
> parsing
> > relies on non-backtracking choice, stream offset or garbage-
> collecting
> > input while parsing, you can not use faster- megaparsec as a drop-
> in
> > replacement.
> > 
> > 
> > 
> > Happy parsing!
> > Olaf
> > 
> > 
> > 
> > [1] https:/ / hackage. haskell. org/ package/ megaparsec (
> > https://hackage.haskell.org/package/megaparsec )
> > [2] https:/ / hackage. haskell. org/ package/ faster-megaparsec (
> > https://hackage.haskell.org/package/faster-megaparsec )
> > [3] https://hub.darcs.net/olf/faster-megaparsec
>