[Haskell-cafe] Floyd Warshall performance (again)
Max Bolingbroke
batterseapower at hotmail.com
Fri Apr 16 04:58:32 EDT 2010
For what is is worth:
$ ghc -cpp -O2 -ddump-asm Main.hs > Main.s
$ time ./a.out
Allocating ...
Allocating ... done
real 0m39.487s
user 0m39.258s
sys 0m0.150s
$ ~/Programming/Checkouts/ghc.llvm/inplace/bin/ghc-stage2 -cpp -fllvm
-O2 Main.hs
$ time ./a.out
Allocating ...
Allocating ... done
real 0m20.443s
user 0m20.281s
sys 0m0.101s
So you have an order of magnitude improvement with the LLVM backend.
It looks to me like your Core code is near-optimal, so the performance
problem is all down to the backend to fix up.
However:
$ gcc -std=c99 -O2 Main.c
$ time ./a.out
real 0m9.120s
user 0m9.030s
sys 0m0.035s
We still have a long way to go!
Cheers,
Max
On 16 April 2010 09:06, Mathieu Boespflug <mboes at tweag.net> wrote:
> Dear haskell-cafe,
>
> I implemented the Floyd Warshall algorithm for finding the shortest
> path in a dense graph in Haskell, but noted the performance was
> extremely poor compared to C. Even using mutable unboxed arrays it was
> running about 30 times slower. I rewrote the program several times, to
> arrive at the following definition:
>
> module Main where
>
> import Data.Array.Base
> import Data.Array.ST
> import Data.Array.Unboxed
> import Control.Monad.ST
> import Control.Monad
> import System.IO
> import GHC.Exts
>
> #define SIZE 1500
>
> -- Actual graph is immaterial for this benchmark.
> graph = []
>
> shortestPath :: [(Int, Int, Int)] -> UArray Int Int
> shortestPath g = runSTUArray $ do
> let mnew = newArray (0, SIZE * SIZE) 1
> mread arr i j = unsafeRead arr (i * SIZE + j)
> mwrite arr i j x = unsafeWrite arr (i * SIZE + j) x
> unsafeIOToST $ hSetBuffering stdout LineBuffering
> unsafeIOToST $ putStrLn "Allocating ..."
> pm <- mnew
> unsafeIOToST $ putStrLn "Allocating ... done"
> let loop1 SIZE = return ()
> loop1 k = let loop2 SIZE = return ()
> loop2 i = let loop3 SIZE = return ()
> loop3 j = do
> xij <- mread pm i j
> xik <- mread pm i k
> xkj <- mread pm k j
> mwrite pm i j (min xij (xik + xkj))
> loop3 (j + 1)
> in loop3 0 >> loop2 (i + 1)
> in loop2 0 >> loop1 (k + 1)
> loop1 0
> return pm
>
> main = shortestPath graph `seq` return ()
>
> These 3 nested loops run a lot faster than generating a list of IO
> actions and then running them using sequence_. But still this program
> runs 3 times slower than it's C counterpart:
>
> $ time ./FloydWarshall
> Allocating ...
> Allocating ... done
> ./FloydWarshall 45,75s user 0,09s system 97% cpu 46,815 total
>
> $ time ./a.out
> ./a.out 14,92s user 0,03s system 98% cpu 15,230 total
>
> The source code for the C program is given below. Using the vector
> package instead of GHC's inbuilt ST arrays doesn't improve the
> situation any, neither does compiling via C. Any idea why the rather
> unidiomatic Haskell program is still 3 times slower, despite using
> unchecked indexing and the fact that the three nested tail recursive
> functions above shoud behave exactly as the 3 nested for loops in the
> C program?
>
> Best regards,
>
> Mathieu
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
More information about the Haskell-Cafe
mailing list