[Haskell-cafe] de-sugared code? (branch from: fast Eucl. dist. - Haskell vs C)

Sam Martin sam.martin at geomerics.com
Mon May 18 11:17:30 EDT 2009

```Hi Don (and cafe),

Given the example you just posted, is there a simple way to generate the de-sugared haskell / core / STG / labelled-assembly versions of a piece of haskell code? For instance, how did you generate the content below? I guess this is the core language version?

I'm a C/C++ coder and looking for the equivalent of "Show Disassembly".

Cheers,
Sam

-----Original Message-----
Sent: Mon 18/05/2009 14:50
To: Kenneth Hoste

kenneth.hoste:
> Hello,
>
> For a while now, I've been trying to come up with a fast Haskell-only
> function which implements Euclidean distance in n-dimensional space.
>
> So far, I've been disappointed by the performance of my best effort
> in Haskell, compared to C. I'm hoping some of the Haskell experts
> and/or performance gurus on this list can help me out on resolving this,
> and also maybe shed some light on some strange (imho) things I've run
> into.
>
> My current best try uses the uvector package, has two 'vectors' of type
> (UArr Double)  as input, and relies on the sumU and zipWithU functions
> which use streaming to compute the result:
>
> dist_fast :: UArr Double -> UArr Double -> Double
> dist_fast p1 p2 = sumDs `seq` sqrt sumDs
>         where
>                 sumDs         = sumU ds
>                 ds            = zipWithU euclidean p1 p2
>                 euclidean x y = d*d
>                         where
>                                 d = x-y

The problem in your uvector code is the use of lists, rather than uvector
generators. Replace [1..n] with enumFromTo:

import System.Environment
import System.IO
import Data.Array.Vector

dist :: UArr Double -> UArr Double -> Double
dist p1 p2 = sumU (zipWithU euclidean p1 p2)
where
euclidean x y = d*d where d = x-y

main = do
[dim] <- map read `fmap` getArgs

print \$
dist
(enumFromToFracU 1.0 dim)
(enumFromToFracU 1.0 dim)

Now the entire thiing will fuse to a loop.

\$s\$wfold_s1RR :: Double# -> Double# -> Double# -> Double#

\$s\$wfold_s1RR =
\ (sc_s1RH :: Double#)
(sc1_s1RI :: Double#)
(sc2_s1RJ :: Double#) ->
case >## sc1_s1RI a5_s1QR of wild4_a1tn {
False ->
case >## sc_s1RH a5_s1QR of wild5_X1vx {
False ->
let {
x1_a1Jg [ALWAYS Just L] :: Double#

x1_a1Jg = -## sc1_s1RI sc_s1RH } in
\$s\$wfold_s1RR
(+## sc_s1RH 1.0)
(+## sc1_s1RI 1.0)
(+## sc2_s1RJ (*## x1_a1Jg x1_a1Jg));
True -> sc2_s1RJ
};
True -> sc2_s1RJ
}; } in
case \$s\$wfold_s1RR 1.0 1.0 0.0 of ww_s1QH { __DEFAULT ->
a19 (D# ww_s1QH)

and this assembly:

\$ ghc -O2 -fvia-C -optc-O3

s1T9_info:
movsd       5(%rbx), %xmm7
ucomisd     %xmm7, %xmm6
ja  .L15
ucomisd     %xmm7, %xmm5
jbe .L18
.L14:
.L15:
movsd       (%rbp), %xmm5
leaq        8(%rbp), %rbp
jmp *(%rbp)
.L18:
movapd      %xmm6, %xmm7
subsd       %xmm5, %xmm7
mulsd       %xmm7, %xmm7
movsd       %xmm7, (%rbp)
movsd       .LC0(%rip), %xmm7
jmp s1T9_info

Which I'd wager will match the C, which has to allocate the two arrays (GHC
essentially decides it doesn't need the arrays any more.

-- Don
_______________________________________________