[Haskell-cafe] More fun with micro-benchmarks and optimizations. (GHC vs Perl)

Corey O'Connor coreyoconnor at gmail.com
Wed Jul 23 14:26:51 EDT 2008


Sounds great! Thanks for the advice. :-)

-Corey

On Wed, Jul 23, 2008 at 1:18 PM, Don Stewart <dons at galois.com> wrote:
> coreyoconnor:
>> I have the need to regularly write tiny programs that analyze output
>> logs. The output logs don't have a consistent formatting so I
>> typically choose Perl for these tasks.
>>
>> The latest instance of having to write such a program was simple
>> enough I figured I'd try my hand at using Haskell instead. The short
>> story is that I could quickly write a Haskell program that achieved
>> the goal. Yay! However, the performance was ~8x slower than a
>> comparable Perl implementation. With a lot of effort I got the Haskell
>> version to only 2x slower. A lot of the optimization was done with
>> guesses that the performance difference was due to how each line was
>> being read from the file. I couldn't determine much using GHC's
>> profiler.
>>
>> I still have two questions after all this:
>>  - Can I get a Haskell implementation as fast as the Perl?
>>  - What do I need to do to get GHC's profiler to provide me usable
>> information? Telling me that 98% of the time was in "main" is not very
>> enlightening ;-)
>>
>> All the code and data for this little experiment I've placed here:
>> http://tothepowerofdisco.com/repo/sum_optimization/
>> My first, and shortest, version is SumTiny.hs. The optimized version
>> is SumFast.hs.
>>
>> The long version for the curious:
>>
>> The (cleaned up) data was a 78k line file consisting of lines like the
>> following:
>> framerate (prev == no pts): 15
>> framerate (delta): 25
>> framerate (invalid timebase): 12.5
>> ... and so on.
>>
>> The need was for a program that calculated 1.0 / framerate for each
>> line and produced the sum. Easy no?
>>
>> My straightforward Haskell solution was:
>> -------------------------------------------------------------------
>> import Text.Regex.Posix
>>
>> main = do
>>     f_lines <- readFile "test.out" >>= return . lines
>>     let duration = foldl add_line 0.0 f_lines
>>         add_line sum line =
>>             let [[_,m]] = line =~ "([0-9.]+)"
>>                 framerate = read m
>>                 delta = 1.0 / framerate
>>             in sum + delta
>>     putStrLn $ "Duration (sec): " ++ show duration
>
> Could you try using lazy and/or strict bytestrings. For *any* IO
> performance issue, that should be your first stop.
>
> See, e.g., this sum-file example,
>
>    http://shootout.alioth.debian.org/gp4/benchmark.php?test=sumcol&lang=ghc&id=6
>
> which outperforms C on the benchmark.
>
> The second point would be to avoid Text.Regex.Posix, which is
> algorithmically much worse than the PCRE regexes used by perl. Instead,
> try the pcre-light or regex-pcre packages, which have both better
> complexity, and operate on efficient bytestrings.
>
> Finally, there's some small details about writing efficient loops. Give
> type declarations for atomic types like Int and Double, and use strict
> folds or explicit recursion, as in this post,
>
>    http://cgi.cse.unsw.edu.au/~dons/blog/2008/05/16#fast
>
> For your task here, with those simple changes, it should be relatively
> easy to produce competitive-with-C performance.
>
> -- Don
>



-- 
-Corey O'Connor


More information about the Haskell-Cafe mailing list