[Haskell-cafe] Haskell-Cafe Digest, Vol 180, Issue 32

Michal J Gajda mgajda at mimuw.edu.pl
Fri Aug 31 12:53:18 UTC 2018


I got 4.7s for similar amount of data in 2013.
However I was pretty sure that fully inlined implementation could
potentially go 5x faster.
http://hackage.haskell.org/package/hPDB

Please check xeno XML parser benchmarks for another example.
https://hackage.haskell.org/package/xeno
On Fri, 31 Aug 2018 at 14:41, <haskell-cafe-request at haskell.org> wrote:

> Send Haskell-Cafe mailing list submissions to
>         haskell-cafe at haskell.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> or, via email, send a message with subject or body 'help' to
>         haskell-cafe-request at haskell.org
>
> You can reach the person managing the list at
>         haskell-cafe-owner at haskell.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Haskell-Cafe digest..."
> Today's Topics:
>
>    1. Re: HDBC packages looking for maintainer (Tobias Dammers)
>    2. Re: Alternative instance for non-backtracking parsers
>       (Olaf Klinke)
>    3. Re: Alternative instance for non-backtracking parsers
>       (Bardur Arantsson)
>
>
>
> ---------- Forwarded message ----------
> From: Tobias Dammers <tdammers at gmail.com>
> To: haskell-cafe at haskell.org
> Cc:
> Bcc:
> Date: Thu, 30 Aug 2018 15:24:04 +0200
> Subject: Re: [Haskell-cafe] HDBC packages looking for maintainer
> Hi,
>
> I'd be interested. I've used HDBC on a few projects, and my yeshql
> library was originally built with HDBC as the only backend. It would be
> a terrible shame to see this bitrot.
>
> Cheers,
>
> Tobias (tdammers on github etc.)
>
> On Mon, Aug 13, 2018 at 12:07:38PM +0200, Erik Hesselink wrote:
> > Hi all,
> >
> > I've been the maintainer for some of the HDBC packages for a while now.
> > Sadly, I've mostly neglected them due to lack of time and usage. While
> the
> > packages mostly work, there are occasional pull requests and updates for
> > new compiler versions.
> >
> > Because of this I'm looking for someone who wants to take over HDBC and
> > related packages [1]. If you use HDBC and would like to take over
> > maintainership, please let me know and we can get things set up.
> >
> > Regards,
> >
> > Erik
> >
> > [1] https://github.com/hdbc
>
> > _______________________________________________
> > Haskell-Cafe mailing list
> > To (un)subscribe, modify options or view archives go to:
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> > Only members subscribed via the mailman list are allowed to post.
>
>
> --
> Tobias Dammers - tdammers at gmail.com
>
>
>
>
> ---------- Forwarded message ----------
> From: Olaf Klinke <olf at aatal-apotheke.de>
> To: PY <aquagnu at gmail.com>
> Cc: haskell-cafe <haskell-cafe at haskell.org>
> Bcc:
> Date: Thu, 30 Aug 2018 20:21:07 +0200
> Subject: Re: [Haskell-cafe] Alternative instance for non-backtracking
> parsers
> > Hello, Olaf. I have some distrust of elegant solutions (one of them are
> > C.P. libs).
>
> I have a program that parses several CSV files, one of them 50MB in size,
> and writes its result as HTML. When I started optimizing, the execution
> time was 144 seconds. Profiling (thanks to Jasper Van der Jeugt for writing
> profiteur!) revealed that most of the time was spent parsing and
> postprocessing the 50MB CSV file. Changing the data structure of the
> postprocessing stage cut down the execution time to 32 seconds, but still
> the majority is spent on parsing.
> Then I realized that (StateT String Maybe) is a parser which conveniently
> has all the class instances one needs, most notably its Alternative
> instance make it a backtracking parser. After defining a few combinators I
> was able to swap out my megaparsec parser against the new parser, which
> slashed execution time in half. Now most of the parsing time is dedicated
> to transforming text to numbers and dates. I doubt that parsing time can be
> reduced much further [*]. The new parser was identical to the old parser,
> only the combinators now come from another module. That is the elegant
> thing about monadic parser libraries.
> I will now use the fast parser by default, and if it returns a Nothing,
> the program will suggest a command line flag that switches to the original
> megaparsec parser, exactly telling the user where the parse failed and why.
> I am not sure whether there is another family of parsers that have
> interfaces so similar that switching from one package to another is as
> effortless as monadic parsers.
>
> Cheers
> Olaf
>
> [*] To the parser experts on this list: How much time should a parser take
> that processes a 50MB, 130000-line text file, extracting 5 values (String,
> UTCTime, Int, Double) from each line?
>
>
>
> ---------- Forwarded message ----------
> From: Bardur Arantsson <spam at scientician.net>
> To: haskell-cafe at haskell.org
> Cc:
> Bcc:
> Date: Thu, 30 Aug 2018 21:43:55 +0200
> Subject: Re: [Haskell-cafe] Alternative instance for non-backtracking
> parsers
> On 30/08/2018 20.21, Olaf Klinke wrote:
> >> Hello, Olaf. I have some distrust of elegant solutions (one of them are
> >> C.P. libs).
> >
> > [*] To the parser experts on this list: How much time should a parser
> take that processes a 50MB, 130000-line text file, extracting 5 values
> (String, UTCTime, Int, Double) from each line?
>
> Not an expert, but for something as (relatively!) standard as CSV, I'd
> probably go for a specialized solution like 'cassava', which seems like
> it does quite well according to https://github.com/haskell-perf/csv
>
> Based purely the lines/second numbers on that page and the number you've
> given, I'd guesstimate that your parsing could potentially be as fast as
> (3.185ms / 1000 lines) * 130000 lines = 414.05ms = 0.4 s.
>
> (Of coure that still doesn't account for extracting the Int, Double,
> etc., but there are also specialized solutions for that which should be
> pretty hard to beat, see e.g. bytestring-lexing.)
>
> It's also probably a bit less elegant than a generic parsec-like thing,
> but that's to be expected for a more special-case solution.
>
> Regards,
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20180831/00eac8f7/attachment.html>


More information about the Haskell-Cafe mailing list