[Haskell-cafe] Alternative instance for non-backtracking parsers

Will Yager will.yager at gmail.com
Mon Sep 3 15:29:35 UTC 2018



On Aug 30, 2018, at 11:21, Olaf Klinke <olf at aatal-apotheke.de> wrote:

> 
> [*] To the parser experts on this list: How much time should a parser take that processes a 50MB, 130000-line text file, extracting 5 values (String, UTCTime, Int, Double) from each line?
> _______________________________________________
> 

The combination of attoparsec + a streaming adapter for pipes/conduit/streaming should easily be able to handle tens of megabytes per second and hundreds of thousands of lines per second. 

For an example, check out   https://github.com/wyager/Callsigns/blob/master/Callsigns.hs

Which parses a pipe-separated-value file from the FCC pretty quickly. As I recall it goes through a  >100MB file in under three seconds, and it has to do a bunch of other work besides. 

I also ported the above code to use Streaming instead of Pipes. I recall that using Streaming master, the parser I use to read the dictionary:

takeTill isEndOfLine <* endOfLine

Handles about 3 million lines per second. I can’t remember what the number is for Pipes but it’s probably similar. That’s really good for such a simple thing to write!

Unfortunately there is a performance bug in Streaming that’s fixed in master but hasn’t been released for a number of months :-/

—Will
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20180903/36b7725c/attachment.html>


More information about the Haskell-Cafe mailing list