[Haskell-cafe] parsec2 vs. parsec3... again
Felipe Almeida Lessa
felipe.lessa at gmail.com
Sat Jan 15 01:23:02 CET 2011
On Thu, Jan 13, 2011 at 12:15 AM, Evan Laforge <qdunkan at gmail.com> wrote:
> Well, I tried it... and it's still slower!
>
> parsec2, String: (a little faster since last time since I have new computer)
> total time = 9.10 secs (455 ticks @ 20 ms)
> total alloc = 2,295,837,512 bytes (excludes profiling overheads)
>
> attoparsec-text, Data.Text:
> total time = 14.72 secs (736 ticks @ 20 ms)
> total alloc = 2,797,672,844 bytes (excludes profiling overheads)
Interesting.
> Just in case there's some useful criticism, here's one of the busier parsers:
>
> p_unsigned_float :: A.Parser Double
> p_unsigned_float = do
> i <- A.takeWhile Char.isDigit
> f <- A.option "" (A.char '.' >> A.takeWhile1 Char.isDigit)
> if (Text.null i && Text.null f) then mzero else do
> case (dec i, dec f) of
> (Just i', Just f') -> return $ fromIntegral i'
> + fromIntegral f' / fromIntegral (10 ^ (Text.length f))
> _ -> mzero
> where
> dec :: Text.Text -> Maybe Int
> dec s
> | Text.null s = Just 0
> | otherwise = case Text.Read.decimal s of
> Right (d, rest) | Text.null rest -> Just d
> _ -> Nothing
>
I've tried creating a benchmark using this code. It's on the recently
created attoparsec-text darcs repo [1,2]. There is a 2.7 MiB test
file with many numbers to be parsed. The attoparsec-text package was
installed using -O (Cabal's default) and the test program was compiled
with ghc -hide-package parsec-3.1.0 --make -O2.
Using parsers that return the parsed number as a double and then sum
everything up, I get the following timings:
attoparsec_text_builtin
2,241,038,864 bytes allocated in the heap
46 MB total memory in use (1 MB lost due to fragmentation)
MUT time 1.10s ( 1.13s elapsed)
GC time 0.15s ( 0.20s elapsed)
Total time 1.25s ( 1.32s elapsed)
attoparsec_text_laforge
1,281,603,768 bytes allocated in the heap
101 MB total memory in use (2 MB lost due to fragmentation)
MUT time 0.58s ( 0.62s elapsed)
GC time 0.47s ( 0.54s elapsed)
Total time 1.05s ( 1.16s elapsed)
parsec_laforge
1,558,621,208 bytes allocated in the heap
47 MB total memory in use (0 MB lost due to fragmentation)
MUT time 0.82s ( 0.84s elapsed)
GC time 0.46s ( 0.51s elapsed)
Total time 1.27s ( 1.35s elapsed)
'attoparsec_text_builtin' uses Data.Attoparsec.Text.double available
on the darcs version of the library. It tries to handle more cases,
like exponents, and thus it is expected to be slower than your
version. 'attoparsec_text_laforge' and 'parsec_laforge' are very
similar to the one you gave in your e-mail, but with some
modifications (e.g. Text.Read.decimal can't be used with Strings).
Using attoparsec-text is faster and allocates less, but for some
reason the faster version takes up a lot more memory.
As the total memory figures were strange, I created a different
version that parses the input but does not create any Doubles.
Instead of summing them, the number of Doubles (if they were parsed)
is counted. These are the results:
attoparsec_text_laforge_discarding
985,843,696 bytes allocated in the heap
25 MB total memory in use (0 MB lost due to fragmentation)
MUT time 0.38s ( 0.39s elapsed)
GC time 0.07s ( 0.10s elapsed)
Total time 0.45s ( 0.49s elapsed)
parsec_laforge_discarding double_test.txt +RTS -s
1,471,829,664 bytes allocated in the heap
28 MB total memory in use (0 MB lost due to fragmentation)
MUT time 0.66s ( 0.68s elapsed)
GC time 0.44s ( 0.46s elapsed)
Total time 1.10s ( 1.14s elapsed)
Now attoparsec-text is more than twice faster, allocates even less
memory and the total memory figures seem right.
Bottom line: I think this benchmark doesn't really represent the kind
of workload your parser has. Can you reproduce these results on your
system?
Cheers! =)
[1] http://patch-tag.com/r/felipe/attoparsec-text/
[2] http://patch-tag.com/r/felipe/attoparsec-text/snapshot/current/content/pretty/benchmarks/Double.hs
--
Felipe.
More information about the Haskell-Cafe
mailing list