[Haskell-cafe] More fun with micro-benchmarks and optimizations. (GHC vs Perl)

Wed Jul 23 15:01:24 EDT 2008

coreyoconnor:
> I have the need to regularly write tiny programs that analyze output
> logs. The output logs don't have a consistent formatting so I
> typically choose Perl for these tasks.
> 
> The latest instance of having to write such a program was simple
> enough I figured I'd try my hand at using Haskell instead. The short
> story is that I could quickly write a Haskell program that achieved
> the goal. Yay! However, the performance was ~8x slower than a
> comparable Perl implementation. With a lot of effort I got the Haskell
> version to only 2x slower. A lot of the optimization was done with
> guesses that the performance difference was due to how each line was
> being read from the file. I couldn't determine much using GHC's
> profiler.

    {-# OPTIONS -fbang-patterns #-}

    import qualified Data.ByteString.Char8 as S
    import Data.ByteString.Lex.Double
    import Debug.Trace

    main = print . go 0 =<< S.getContents
      where
        go !n !s = case readDouble str of
                Nothing     -> n
                Just (k,t)  -> let delta = 1.0 / k in go (n+delta) t
            where
                (_, xs) = S.break ((==) ':') s
                str     = S.drop 2 xs

It uses the bytestring-lexing package on Hackage to read the Doubles 
out,

    $ ghc --make Fast.hs -O2 -fvia-C -optc-O2
    $ time  ./Fast < test.out
    3155.626666664377
    ./Fast < test.out  0.07s user 0.01s system 97% cpu 0.078 total

So that's twice as fast as the perl entry on my box,

    $ time perl Sum.pl < test.out
    Duration (sec): 3155.62666666438
    perl Sum.pl < test.out  0.15s user 0.03s system 100% cpu 0.180 total

Note that the safe Double lexer uses a bit of copying, so
we can in fact do better still with a non-copying Double parser,
but that's only for the hardcore.

-- Don