[Haskell-cafe] Alternative instance for non-backtracking parsers
spam at scientician.net
Thu Aug 30 19:43:55 UTC 2018
On 30/08/2018 20.21, Olaf Klinke wrote:
>> Hello, Olaf. I have some distrust of elegant solutions (one of them are
>> C.P. libs).
> [*] To the parser experts on this list: How much time should a parser take that processes a 50MB, 130000-line text file, extracting 5 values (String, UTCTime, Int, Double) from each line?
Not an expert, but for something as (relatively!) standard as CSV, I'd
probably go for a specialized solution like 'cassava', which seems like
it does quite well according to https://github.com/haskell-perf/csv
Based purely the lines/second numbers on that page and the number you've
given, I'd guesstimate that your parsing could potentially be as fast as
(3.185ms / 1000 lines) * 130000 lines = 414.05ms = 0.4 s.
(Of coure that still doesn't account for extracting the Int, Double,
etc., but there are also specialized solutions for that which should be
pretty hard to beat, see e.g. bytestring-lexing.)
It's also probably a bit less elegant than a generic parsec-like thing,
but that's to be expected for a more special-case solution.
More information about the Haskell-Cafe