[Haskell-cafe] parsec or attoparsec for 40-50MB text files ?

Raphael Gaschignard dasuraga at gmail.com
Mon Jun 8 01:36:51 UTC 2015


offtopic, but since we are talking about Parsec/Attoparsec, is there a way
to have try by default in Parsec as well?

On Mon, Jun 8, 2015 at 9:23 AM Chris Wong <lambda.fairy at gmail.com> wrote:

> Hi Brian,
>
> Parsec and Attoparsec have very similar interfaces (afaik the only
> difference is that Attoparsec backtracks by default, so the "try"
> combinator is a no-op) so there's no harm in trying both.
>
> Alternatively: if the data format is simple enough, you can write the
> parser by hand. The Data.Text.Read module may help if you pursue this
> option. [1]
>
> Chris
>
> [1]:
> https://hackage.haskell.org/package/text-1.2.1.1/docs/Data-Text-Read.html
>
> On Mon, Jun 8, 2015 at 11:04 AM,  <briand at aracnet.com> wrote:
> > Hi,
> >
> > My file is pretty straightforward text file with a small amount of
> somewhat annoying state:
> >
> > comments*
> > config line
> > comments*
> > data line*
> >
> > if there is no config line it's an error.  the data lines can have a
> variable number of values and it matters how many values there are (hey-
> it's not my file format !).  the data lines can also have a comment at the
> end.
> >
> > My initial thought was to go with parsec but the data files could be as
> large as 40-50MB and upon further reading it really seemed like attoparsec
> would be better. Error handling wouldn't be too sophisticated.  if a data
> line has something other than 1 or more floating point values and the
> optional comment, failing out with "error line X" is fine.
> >
> > parse time is somewhat critical only because i'll have multiple files to
> parse, so while 5-10 seconds is ok for one file, i have to multiply that by
> 5-10.
> >
> > I've seen several comments talking about the fact that parsec can be
> slow, but so far unable to find anything the quantifies "slow".
> >
> > Any opinions on which would be better for my application (although i
> think i've just talked myself into using attoparsec) ?
> >
> > In particular- am i going to get at least reasonable "error on line X"
> error handling using attoparsec ?
> >
> >
> > Thanks,
> >
> > Brian
> >
> >
> > _______________________________________________
> > Haskell-Cafe mailing list
> > Haskell-Cafe at haskell.org
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>
>
>
> --
> https://lambda.xyz
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20150608/cfbbf6d3/attachment.html>


More information about the Haskell-Cafe mailing list