[Haskell-cafe] More fun with micro-benchmarks and optimizations.
(GHC vs Perl)
coreyoconnor at gmail.com
Wed Jul 23 14:26:51 EDT 2008
Sounds great! Thanks for the advice. :-)
On Wed, Jul 23, 2008 at 1:18 PM, Don Stewart <dons at galois.com> wrote:
>> I have the need to regularly write tiny programs that analyze output
>> logs. The output logs don't have a consistent formatting so I
>> typically choose Perl for these tasks.
>> The latest instance of having to write such a program was simple
>> enough I figured I'd try my hand at using Haskell instead. The short
>> story is that I could quickly write a Haskell program that achieved
>> the goal. Yay! However, the performance was ~8x slower than a
>> comparable Perl implementation. With a lot of effort I got the Haskell
>> version to only 2x slower. A lot of the optimization was done with
>> guesses that the performance difference was due to how each line was
>> being read from the file. I couldn't determine much using GHC's
>> I still have two questions after all this:
>> - Can I get a Haskell implementation as fast as the Perl?
>> - What do I need to do to get GHC's profiler to provide me usable
>> information? Telling me that 98% of the time was in "main" is not very
>> enlightening ;-)
>> All the code and data for this little experiment I've placed here:
>> My first, and shortest, version is SumTiny.hs. The optimized version
>> is SumFast.hs.
>> The long version for the curious:
>> The (cleaned up) data was a 78k line file consisting of lines like the
>> framerate (prev == no pts): 15
>> framerate (delta): 25
>> framerate (invalid timebase): 12.5
>> ... and so on.
>> The need was for a program that calculated 1.0 / framerate for each
>> line and produced the sum. Easy no?
>> My straightforward Haskell solution was:
>> import Text.Regex.Posix
>> main = do
>> f_lines <- readFile "test.out" >>= return . lines
>> let duration = foldl add_line 0.0 f_lines
>> add_line sum line =
>> let [[_,m]] = line =~ "([0-9.]+)"
>> framerate = read m
>> delta = 1.0 / framerate
>> in sum + delta
>> putStrLn $ "Duration (sec): " ++ show duration
> Could you try using lazy and/or strict bytestrings. For *any* IO
> performance issue, that should be your first stop.
> See, e.g., this sum-file example,
> which outperforms C on the benchmark.
> The second point would be to avoid Text.Regex.Posix, which is
> algorithmically much worse than the PCRE regexes used by perl. Instead,
> try the pcre-light or regex-pcre packages, which have both better
> complexity, and operate on efficient bytestrings.
> Finally, there's some small details about writing efficient loops. Give
> type declarations for atomic types like Int and Double, and use strict
> folds or explicit recursion, as in this post,
> For your task here, with those simple changes, it should be relatively
> easy to produce competitive-with-C performance.
> -- Don
More information about the Haskell-Cafe