[Haskell-cafe] parsec2 vs. parsec3... again

Sat Jan 15 01:23:02 CET 2011

On Thu, Jan 13, 2011 at 12:15 AM, Evan Laforge <qdunkan at gmail.com> wrote:
> Well, I tried it... and it's still slower!
>
> parsec2, String: (a little faster since last time since I have new computer)
>        total time  =        9.10 secs   (455 ticks @ 20 ms)
>        total alloc = 2,295,837,512 bytes  (excludes profiling overheads)
>
> attoparsec-text, Data.Text:
>        total time  =       14.72 secs   (736 ticks @ 20 ms)
>        total alloc = 2,797,672,844 bytes  (excludes profiling overheads)

Interesting.

> Just in case there's some useful criticism, here's one of the busier parsers:
>
> p_unsigned_float :: A.Parser Double
> p_unsigned_float = do
>    i <- A.takeWhile Char.isDigit
>    f <- A.option "" (A.char '.' >> A.takeWhile1 Char.isDigit)
>    if (Text.null i && Text.null f) then mzero else do
>    case (dec i, dec f) of
>        (Just i', Just f') -> return $ fromIntegral i'
>            + fromIntegral f' / fromIntegral (10 ^ (Text.length f))
>        _ -> mzero
>    where
>    dec :: Text.Text -> Maybe Int
>    dec s
>        | Text.null s = Just 0
>        | otherwise = case Text.Read.decimal s of
>            Right (d, rest) | Text.null rest -> Just d
>            _ -> Nothing
>

I've tried creating a benchmark using this code.  It's on the recently
created attoparsec-text darcs repo [1,2].  There is a 2.7 MiB test
file with many numbers to be parsed.  The attoparsec-text package was
installed using -O (Cabal's default) and the test program was compiled
with ghc -hide-package parsec-3.1.0 --make -O2.

Using parsers that return the parsed number as a double and then sum
everything up, I get the following timings:

attoparsec_text_builtin
   2,241,038,864 bytes allocated in the heap
              46 MB total memory in use (1 MB lost due to fragmentation)
  MUT   time    1.10s  (  1.13s elapsed)
  GC    time    0.15s  (  0.20s elapsed)
  Total time    1.25s  (  1.32s elapsed)

attoparsec_text_laforge
   1,281,603,768 bytes allocated in the heap
             101 MB total memory in use (2 MB lost due to fragmentation)
  MUT   time    0.58s  (  0.62s elapsed)
  GC    time    0.47s  (  0.54s elapsed)
  Total time    1.05s  (  1.16s elapsed)

parsec_laforge
   1,558,621,208 bytes allocated in the heap
              47 MB total memory in use (0 MB lost due to fragmentation)
  MUT   time    0.82s  (  0.84s elapsed)
  GC    time    0.46s  (  0.51s elapsed)
  Total time    1.27s  (  1.35s elapsed)

'attoparsec_text_builtin' uses Data.Attoparsec.Text.double available
on the darcs version of the library.  It tries to handle more cases,
like exponents, and thus it is expected to be slower than your
version.  'attoparsec_text_laforge' and 'parsec_laforge' are very
similar to the one you gave in your e-mail, but with some
modifications (e.g. Text.Read.decimal can't be used with Strings).
Using attoparsec-text is faster and allocates less, but for some
reason the faster version takes up a lot more memory.

As the total memory figures were strange, I created a different
version that parses the input but does not create any Doubles.
Instead of summing them, the number of Doubles (if they were parsed)
is counted.  These are the results:

attoparsec_text_laforge_discarding
     985,843,696 bytes allocated in the heap
              25 MB total memory in use (0 MB lost due to fragmentation)
  MUT   time    0.38s  (  0.39s elapsed)
  GC    time    0.07s  (  0.10s elapsed)
  Total time    0.45s  (  0.49s elapsed)

parsec_laforge_discarding double_test.txt +RTS -s
   1,471,829,664 bytes allocated in the heap
              28 MB total memory in use (0 MB lost due to fragmentation)
  MUT   time    0.66s  (  0.68s elapsed)
  GC    time    0.44s  (  0.46s elapsed)
  Total time    1.10s  (  1.14s elapsed)

Now attoparsec-text is more than twice faster, allocates even less
memory and the total memory figures seem right.

Bottom line: I think this benchmark doesn't really represent the kind
of workload your parser has.  Can you reproduce these results on your
system?

Cheers! =)

[1] http://patch-tag.com/r/felipe/attoparsec-text/
[2] http://patch-tag.com/r/felipe/attoparsec-text/snapshot/current/content/pretty/benchmarks/Double.hs

-- 
Felipe.