[Haskell-cafe] parsec or attoparsec for 40-50MB text files ?
Chris Wong
lambda.fairy at gmail.com
Mon Jun 8 00:23:19 UTC 2015
Hi Brian,
Parsec and Attoparsec have very similar interfaces (afaik the only
difference is that Attoparsec backtracks by default, so the "try"
combinator is a no-op) so there's no harm in trying both.
Alternatively: if the data format is simple enough, you can write the
parser by hand. The Data.Text.Read module may help if you pursue this
option. [1]
Chris
[1]: https://hackage.haskell.org/package/text-1.2.1.1/docs/Data-Text-Read.html
On Mon, Jun 8, 2015 at 11:04 AM, <briand at aracnet.com> wrote:
> Hi,
>
> My file is pretty straightforward text file with a small amount of somewhat annoying state:
>
> comments*
> config line
> comments*
> data line*
>
> if there is no config line it's an error. the data lines can have a variable number of values and it matters how many values there are (hey- it's not my file format !). the data lines can also have a comment at the end.
>
> My initial thought was to go with parsec but the data files could be as large as 40-50MB and upon further reading it really seemed like attoparsec would be better. Error handling wouldn't be too sophisticated. if a data line has something other than 1 or more floating point values and the optional comment, failing out with "error line X" is fine.
>
> parse time is somewhat critical only because i'll have multiple files to parse, so while 5-10 seconds is ok for one file, i have to multiply that by 5-10.
>
> I've seen several comments talking about the fact that parsec can be slow, but so far unable to find anything the quantifies "slow".
>
> Any opinions on which would be better for my application (although i think i've just talked myself into using attoparsec) ?
>
> In particular- am i going to get at least reasonable "error on line X" error handling using attoparsec ?
>
>
> Thanks,
>
> Brian
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
--
https://lambda.xyz
More information about the Haskell-Cafe
mailing list