[Haskell-cafe] fast Eucl. dist. - Haskell vs C
kenneth.hoste at ugent.be
Tue May 19 03:19:23 EDT 2009
On May 18, 2009, at 15:28 , Claus Reinke wrote:
>> My current best try uses the uvector package, has two 'vectors' of
>> (UArr Double) as input, and relies on the sumU and zipWithU
>> which use streaming to compute the result:
>> dist_fast :: UArr Double -> UArr Double -> Double
>> dist_fast p1 p2 = sumDs `seq` sqrt sumDs
>> sumDs = sumU ds
>> ds = zipWithU euclidean p1 p2
>> euclidean x y = d*d
>> d = x-y
> You'll probably want to make sure that 'euclidian' is specialized to
> the types you need (here 'Double'), not used overloaded for 'Num a=>a'
> (check -ddump-tc, or -ddump-simpl output).
I understand from your later post that is was in fact specialized, but
how do I make sure it _is_ specialized? Can I just add a type signature
in the dist_fast definition for euclidean, or should I define euclidean
outside of dist_fast, with an explicit type signature?
If the latter, won't that hurt performance? Or should marking it INLINE
take care of that?
> After that, unrolling the fused fold loop (uvector internal) might
> a bit, but isn't there yet:
> And even if that gets implemented, it doesn't apply directly to your
> case, where the loop is in a library, but you might want to control
> unrolling in your client code. Having the loop unrolled by a default
> factor (8x or so) should help for loops like this, with little
This seems rather serious, and might be one of the bigger reasons why
I'm getting nowhere close to C in terms of performance...
The loop body is ridiculously small, so it would make sense to
unroll it somewhat to help avoid the loop overhead.
However, it seems like GHC isn't able to do that now.
Is there any way to unroll the loop myself, to speed things up?
Seems hard, because I'm using uvector...
Paris research group - ELIS - Ghent University, Belgium
email: kenneth.hoste at elis.ugent.be
More information about the Haskell-Cafe