[Haskell-cafe] parsec or attoparsec for 40-50MB text files ?

briand at aracnet.com briand at aracnet.com
Sun Jun 7 23:04:37 UTC 2015


Hi,

My file is pretty straightforward text file with a small amount of somewhat annoying state:

comments*
config line
comments*
data line*

if there is no config line it's an error.  the data lines can have a variable number of values and it matters how many values there are (hey- it's not my file format !).  the data lines can also have a comment at the end.

My initial thought was to go with parsec but the data files could be as large as 40-50MB and upon further reading it really seemed like attoparsec would be better. Error handling wouldn't be too sophisticated.  if a data line has something other than 1 or more floating point values and the optional comment, failing out with "error line X" is fine.

parse time is somewhat critical only because i'll have multiple files to parse, so while 5-10 seconds is ok for one file, i have to multiply that by 5-10.

I've seen several comments talking about the fact that parsec can be slow, but so far unable to find anything the quantifies "slow".

Any opinions on which would be better for my application (although i think i've just talked myself into using attoparsec) ?

In particular- am i going to get at least reasonable "error on line X" error handling using attoparsec ?


Thanks,

Brian




More information about the Haskell-Cafe mailing list