[Haskell-cafe] parsec or attoparsec for 40-50MB text files ?

João Cristóvão jmacristovao at gmail.com
Mon Jun 8 08:28:51 UTC 2015


You may want to try:
https://hackage.haskell.org/package/attoparsec-parsec

João

2015-06-08 2:36 GMT+01:00 Raphael Gaschignard <dasuraga at gmail.com>:

> offtopic, but since we are talking about Parsec/Attoparsec, is there a way
> to have try by default in Parsec as well?
>
> On Mon, Jun 8, 2015 at 9:23 AM Chris Wong <lambda.fairy at gmail.com> wrote:
>
>> Hi Brian,
>>
>> Parsec and Attoparsec have very similar interfaces (afaik the only
>> difference is that Attoparsec backtracks by default, so the "try"
>> combinator is a no-op) so there's no harm in trying both.
>>
>> Alternatively: if the data format is simple enough, you can write the
>> parser by hand. The Data.Text.Read module may help if you pursue this
>> option. [1]
>>
>> Chris
>>
>> [1]:
>> https://hackage.haskell.org/package/text-1.2.1.1/docs/Data-Text-Read.html
>>
>> On Mon, Jun 8, 2015 at 11:04 AM,  <briand at aracnet.com> wrote:
>> > Hi,
>> >
>> > My file is pretty straightforward text file with a small amount of
>> somewhat annoying state:
>> >
>> > comments*
>> > config line
>> > comments*
>> > data line*
>> >
>> > if there is no config line it's an error.  the data lines can have a
>> variable number of values and it matters how many values there are (hey-
>> it's not my file format !).  the data lines can also have a comment at the
>> end.
>> >
>> > My initial thought was to go with parsec but the data files could be as
>> large as 40-50MB and upon further reading it really seemed like attoparsec
>> would be better. Error handling wouldn't be too sophisticated.  if a data
>> line has something other than 1 or more floating point values and the
>> optional comment, failing out with "error line X" is fine.
>> >
>> > parse time is somewhat critical only because i'll have multiple files
>> to parse, so while 5-10 seconds is ok for one file, i have to multiply that
>> by 5-10.
>> >
>> > I've seen several comments talking about the fact that parsec can be
>> slow, but so far unable to find anything the quantifies "slow".
>> >
>> > Any opinions on which would be better for my application (although i
>> think i've just talked myself into using attoparsec) ?
>> >
>> > In particular- am i going to get at least reasonable "error on line X"
>> error handling using attoparsec ?
>> >
>> >
>> > Thanks,
>> >
>> > Brian
>> >
>> >
>> > _______________________________________________
>> > Haskell-Cafe mailing list
>> > Haskell-Cafe at haskell.org
>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>>
>>
>>
>> --
>> https://lambda.xyz
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20150608/3b70cb57/attachment.html>


More information about the Haskell-Cafe mailing list