[Haskell-cafe] fast Eucl. dist. - Haskell vs C
Don Stewart
dons at galois.com
Mon May 18 09:50:15 EDT 2009
kenneth.hoste:
> Hello,
>
> For a while now, I've been trying to come up with a fast Haskell-only
> function which implements Euclidean distance in n-dimensional space.
>
> So far, I've been disappointed by the performance of my best effort
> in Haskell, compared to C. I'm hoping some of the Haskell experts
> and/or performance gurus on this list can help me out on resolving this,
> and also maybe shed some light on some strange (imho) things I've run
> into.
>
> My current best try uses the uvector package, has two 'vectors' of type
> (UArr Double) as input, and relies on the sumU and zipWithU functions
> which use streaming to compute the result:
>
> dist_fast :: UArr Double -> UArr Double -> Double
> dist_fast p1 p2 = sumDs `seq` sqrt sumDs
> where
> sumDs = sumU ds
> ds = zipWithU euclidean p1 p2
> euclidean x y = d*d
> where
> d = x-y
The problem in your uvector code is the use of lists, rather than uvector
generators. Replace [1..n] with enumFromTo:
import Control.Monad
import System.Environment
import System.IO
import Data.Array.Vector
dist :: UArr Double -> UArr Double -> Double
dist p1 p2 = sumU (zipWithU euclidean p1 p2)
where
euclidean x y = d*d where d = x-y
main = do
[dim] <- map read `fmap` getArgs
print $
dist
(enumFromToFracU 1.0 dim)
(enumFromToFracU 1.0 dim)
Now the entire thiing will fuse to a loop.
$s$wfold_s1RR :: Double# -> Double# -> Double# -> Double#
$s$wfold_s1RR =
\ (sc_s1RH :: Double#)
(sc1_s1RI :: Double#)
(sc2_s1RJ :: Double#) ->
case >## sc1_s1RI a5_s1QR of wild4_a1tn {
False ->
case >## sc_s1RH a5_s1QR of wild5_X1vx {
False ->
let {
x1_a1Jg [ALWAYS Just L] :: Double#
x1_a1Jg = -## sc1_s1RI sc_s1RH } in
$s$wfold_s1RR
(+## sc_s1RH 1.0)
(+## sc1_s1RI 1.0)
(+## sc2_s1RJ (*## x1_a1Jg x1_a1Jg));
True -> sc2_s1RJ
};
True -> sc2_s1RJ
}; } in
case $s$wfold_s1RR 1.0 1.0 0.0 of ww_s1QH { __DEFAULT ->
a19 (D# ww_s1QH)
and this assembly:
$ ghc -O2 -fvia-C -optc-O3
s1T9_info:
movsd 5(%rbx), %xmm7
ucomisd %xmm7, %xmm6
ja .L15
ucomisd %xmm7, %xmm5
jbe .L18
.L14:
.L15:
movsd (%rbp), %xmm5
leaq 8(%rbp), %rbp
jmp *(%rbp)
.L18:
movapd %xmm6, %xmm7
subsd %xmm5, %xmm7
mulsd %xmm7, %xmm7
addsd (%rbp), %xmm7
movsd %xmm7, (%rbp)
movsd .LC0(%rip), %xmm7
addsd %xmm7, %xmm5
addsd %xmm7, %xmm6
jmp s1T9_info
Which I'd wager will match the C, which has to allocate the two arrays (GHC
essentially decides it doesn't need the arrays any more.
-- Don
More information about the Haskell-Cafe
mailing list