[Haskell-cafe] Data.Complex.magnitude slow?
stefan kersten
sk at k-hornz.de
Thu Jul 17 12:56:52 EDT 2008
On 17.07.2008, at 17:42, Ian Lynagh wrote:
> On Thu, Jul 17, 2008 at 05:18:01PM +0200, Henning Thielemann wrote:
>> Complex.magnitude must prevent overflows, that is, if you just square
>> 1e200::Double you get an overflow, although the end result may be
>> also
>> around 1e200. I guess, that to this end Complex.magnitude will
>> separate
>> mantissa and exponent, but this is done via Integers, I'm afraid.
>
> Here's the code:
>
> {-# SPECIALISE magnitude :: Complex Double -> Double #-}
> magnitude :: (RealFloat a) => Complex a -> a
> magnitude (x:+y) = scaleFloat k
> (sqrt ((scaleFloat mk x)^(2::Int) +
> (scaleFloat mk y)^(2::Int)))
> where k = max (exponent x) (exponent y)
> mk = - k
>
> So the slowdown may be due to the scaling, presumably to prevent
> overflow as you say. However, the e^(2 :: Int) may also be causing a
> slowdown, as (^) is lazy in its first argument; I'm not sure if
> there is
> a rule that will rewrite that to e*e. Stefan, perhaps you can try
> timing
> with the above code, and also with:
>
> {-# SPECIALISE magnitude :: Complex Double -> Double #-}
> magnitude :: (RealFloat a) => Complex a -> a
> magnitude (x:+y) = scaleFloat k
> (sqrt (sqr (scaleFloat mk x) + sqr (scaleFloat
> mk y)))
> where k = max (exponent x) (exponent y)
> mk = - k
> sqr x = x * x
>
> and let us know what the results are?
thanks ian, here are the absolute runtimes (non-instrumented code)
and the corresponding entries in the profile:
c_magnitude0 (Complex.Data.magnitude) 0m7.249s
c_magnitude1 (non-scaling version) 0m1.176s
c_magnitude2 (scaling version, strict square) 0m3.278s
%time %alloc
(inherited)
c_magnitude0 91.6 90.2
c_magnitude1 41.7 49.6
c_magnitude2 81.5 71.1
interestingly, just pasting the original ghc library implementation
seems to
slow things down considerably (0m12.264s) when compiling with
-O2
-funbox-strict-fields
-fvia-C
-optc-O2
-fdicts-cheap
-fno-method-sharing
-fglasgow-exts
when leaving away -fdicts-cheap and -fno-method-sharing the execution
time for
the pasted library code reduces to 0m6.873s. seems like some options
that are
useful (or even necessary?) for stream fusion rule reduction, may
produce
non-optimal code in other cases?
<sk>
More information about the Haskell-Cafe
mailing list