[Haskell-cafe] parsec or attoparsec for 40-50MB text files ?

Chris Wong lambda.fairy at gmail.com
Mon Jun 8 00:23:19 UTC 2015


Hi Brian,

Parsec and Attoparsec have very similar interfaces (afaik the only
difference is that Attoparsec backtracks by default, so the "try"
combinator is a no-op) so there's no harm in trying both.

Alternatively: if the data format is simple enough, you can write the
parser by hand. The Data.Text.Read module may help if you pursue this
option. [1]

Chris

[1]: https://hackage.haskell.org/package/text-1.2.1.1/docs/Data-Text-Read.html

On Mon, Jun 8, 2015 at 11:04 AM,  <briand at aracnet.com> wrote:
> Hi,
>
> My file is pretty straightforward text file with a small amount of somewhat annoying state:
>
> comments*
> config line
> comments*
> data line*
>
> if there is no config line it's an error.  the data lines can have a variable number of values and it matters how many values there are (hey- it's not my file format !).  the data lines can also have a comment at the end.
>
> My initial thought was to go with parsec but the data files could be as large as 40-50MB and upon further reading it really seemed like attoparsec would be better. Error handling wouldn't be too sophisticated.  if a data line has something other than 1 or more floating point values and the optional comment, failing out with "error line X" is fine.
>
> parse time is somewhat critical only because i'll have multiple files to parse, so while 5-10 seconds is ok for one file, i have to multiply that by 5-10.
>
> I've seen several comments talking about the fact that parsec can be slow, but so far unable to find anything the quantifies "slow".
>
> Any opinions on which would be better for my application (although i think i've just talked myself into using attoparsec) ?
>
> In particular- am i going to get at least reasonable "error on line X" error handling using attoparsec ?
>
>
> Thanks,
>
> Brian
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe



-- 
https://lambda.xyz


More information about the Haskell-Cafe mailing list