stefan kersten sk at k-hornz.de
Thu Jul 17 12:56:52 EDT 2008

```On 17.07.2008, at 17:42, Ian Lynagh wrote:
> On Thu, Jul 17, 2008 at 05:18:01PM +0200, Henning Thielemann wrote:
>> Complex.magnitude must prevent overflows, that is, if you just square
>> 1e200::Double you get an overflow, although the end result may be
>> also
>> around 1e200. I guess, that to this end Complex.magnitude will
>> separate
>> mantissa and exponent, but this is done via Integers, I'm afraid.
>
> Here's the code:
>
> {-# SPECIALISE magnitude :: Complex Double -> Double #-}
> magnitude :: (RealFloat a) => Complex a -> a
> magnitude (x:+y) =  scaleFloat k
>                      (sqrt ((scaleFloat mk x)^(2::Int) +
> (scaleFloat mk y)^(2::Int)))
>                     where k  = max (exponent x) (exponent y)
>                           mk = - k
>
> So the slowdown may be due to the scaling, presumably to prevent
> overflow as you say. However, the e^(2 :: Int) may also be causing a
> slowdown, as (^) is lazy in its first argument; I'm not sure if
> there is
> a rule that will rewrite that to e*e. Stefan, perhaps you can try
> timing
> with the above code, and also with:
>
> {-# SPECIALISE magnitude :: Complex Double -> Double #-}
> magnitude :: (RealFloat a) => Complex a -> a
> magnitude (x:+y) =  scaleFloat k
>                      (sqrt (sqr (scaleFloat mk x) + sqr (scaleFloat
> mk y)))
>                     where k  = max (exponent x) (exponent y)
>                           mk = - k
>                           sqr x = x * x
>
> and let us know what the results are?

thanks ian, here are the absolute runtimes (non-instrumented code)
and the corresponding entries in the profile:

c_magnitude0 (Complex.Data.magnitude)           0m7.249s
c_magnitude1 (non-scaling version)              0m1.176s
c_magnitude2 (scaling version, strict square)   0m3.278s

%time  %alloc
(inherited)

c_magnitude0 91.6   90.2
c_magnitude1 41.7   49.6
c_magnitude2 81.5   71.1

interestingly, just pasting the original ghc library implementation
seems to
slow things down considerably (0m12.264s) when compiling with

-O2
-funbox-strict-fields
-fvia-C
-optc-O2
-fdicts-cheap
-fno-method-sharing
-fglasgow-exts

when leaving away -fdicts-cheap and -fno-method-sharing the execution
time for
the pasted library code reduces to 0m6.873s. seems like some options
that are
useful (or even necessary?) for stream fusion rule reduction, may
produce
non-optimal code in other cases?

<sk>

```