[Haskell-beginners] The cost of generality,
or how expensive is realToFrac?
Daniel Fischer
daniel.is.fischer at web.de
Wed Sep 15 18:36:11 EDT 2010
On Wednesday 15 September 2010 20:50:13, Greg wrote:
> I hadn't come across rewrite rules yet. They definitely look like
> something worth learning,
Absolutely. GHC's optimiser is good, but there are a lot of cases where you
need to push it via rewrite rules if you write polymorphic code or if you
want to eliminate intermediate data structures (e.g. list fusion).
> though I'm not sure I'm prepared to start making custom versions
> of OpenGL.Raw...
Yes, if you can work around the issues without that, it's better to leave
it in peace :)
Though you might ask the maintainer for rewrite rules.
>
> It looks like I managed to put that battle off for another day, however.
> I did look at how realToFrac is implemented and (as you mention) it
> does the fromRational . toRational transform pair suggested in a number
> of sources, including Real World Haskell. Looking at what toRational is
> doing, creating a ratio of integers out of a float it seems like a crazy
> amount of effort to go through just to convert floating point numbers.
I just did some benchmarking.
I benchmarked
foldl' (+) 0 [convert (1 / intToDoub k) | k <- [1 .. 100000]]
where
intToDoub :: Int -> Double
intToDoub = fromIntegral
for several functions
convert :: Double -> Float (actually, the type has been
(RealFloat a, RealFloat b) => a -> b, but it was used at a = Double,
b = Float).
Everything was compiled with -O2, so the rewrite rules fired, in particular
intToDoub was replaced by a primop (int2Double#), so that one's ultra
cheap.
For convert = realToFrac (hence by the rewrite rules the primop
double2Float# was used), I got pretty good times, mean was 6.76 ms.
For convert = floatToFloat from below, the times were not too bad, with a
mean of 26.3 ms.
A factor of roughly four for this benchmark (the factor for the conversion
itself will be higher, but not exorbitantly) means it's usable in many
situations, but not in performance critical situations where the conversion
takes a significant amount of the running time. If you're converting to
draw stuff with OpenGL (or some other graphics library), the conversion
will take only a relatively small part of the time, so it's fine.
For convert = fromRational . toRational (so no rewrite rules), the times
were rather appalling: mean was 3.34 seconds.
A factor of nearly 500 versus double2Float#.
toRational is bad. Looking at
instance Real Double where
toRational x = (m%1)*(b%1)^^n
where (m,n) = decodeFloat x
b = floatRadix x
(same for Float), and the implementations of (^^) and (^),
{-# SPECIALISE (^) ::
Integer -> Integer -> Integer,
Integer -> Int -> Integer,
Int -> Int -> Int #-}
(^) :: (Num a, Integral b) => a -> b -> a
x0 ^ y0 | y0 < 0 = error "Negative exponent"
| y0 == 0 = 1
| otherwise = f x0 y0
where -- f : x0 ^ y0 = x ^ y
f x y | even y = f (x * x) (y `quot` 2)
| y == 1 = x
| otherwise = g (x * x) ((y - 1) `quot` 2) x
-- g : x0 ^ y0 = (x ^ y) * z
g x y z | even y = g (x * x) (y `quot` 2) z
| y == 1 = x * z
| otherwise = g (x * x) ((y - 1) `quot` 2) (x * z)
-- | raise a number to an integral power
{-# SPECIALISE (^^) ::
Rational -> Int -> Rational #-}
(^^) :: (Fractional a, Integral b) => a -> b -> a
x ^^ n = if n >= 0 then x^n else recip (x^(negate n))
together with the multiplication and recip for Rationals, I have to say
ouch!
There's no special implementation and rewrite rule for powers of Rationals,
so on each multiplication in (^), the gcd of numerator and denominator is
calculated, *although as powers of the original numerator and denominator
they are guaranteed to be coprime*. Considering how slow a division of
Integers is, awwwwwww noooooo.
So let's look at a better implementation of toRational:
toRat :: RealFloat a => a -> Rational
toRat x = case decodeFloat x of
(m,e) -> case floatRadix x of
b -> if e < 0
then (m % (b^(negate e)))
else (m * b^e) :% 1
(inlined a better implementation of powers for Rationals).
Benchmarking convert = fromRational . toRat show a significant improvement,
the mean dropped to 2.75 seconds.
Still appalling, but it's a nice improvement and I don't see any quick
opportunities to improve that conversion.
So let's come to the last, fromRational. That's a compicated function, and
unfortunately it has to be and I've no good idea to improve it.
fromRational is really evil (in terms of clock cycles).
Replacing fromRational with a dummy that just forces the evaluation of its
argument and returns NaN, ±Infinity, or 0 for all real Rational values,
dummy . toRational had a mean of 623.5 ms and
dummy . toRat had a mean of 200.7 ms.
So toRat is a jolly good improvement over toRational, but it's still
awfully slow. And since fromRational takes much much longer anyway, it's a
not too impressive gain for realToFrac.
>
> Looking at the RealFloat class rather that Real and Fractional, it seems
> like this is a much more efficient way to go:
>
> floatToFloat :: (RealFloat a, RealFloat b) => a -> b
> floatToFloat = (uncurry encodeFloat) . decodeFloat
Yes, that's much more efficient, as witnessed by the benchmark results.
But.
>
> I substituted this in for realToFrac and I'm back to close to my
> original performance. Playing with a few test cases in ghci, it looks
> numerically equivalent to realToFrac.
>
> This begs the question though--
No. Sorry, but I can't bear that misuse:
http://en.wikipedia.org/wiki/Begging_the_question
It raises/demands/invites the question, but it doesn't beg it.
> am I doing something dangerous here?
Yes and no.
> Why isn't this the standard approach?
Because it will wreak unspeakable havoc when someone creates a RealFloat
instance with a floatRadix > 2. A floatRadix of 10 or some power of 2 (16,
256?) could be even reasonable.
But for conversions between RealFloat types with the same floatRadix, it's
sort of okay, only it clobbers NaNs and (for some conversions) Infinities.
However, realToFrac does that too.
>
> If I understand what's happening, decodeFloat and encodeFloat are
> breaking the floating point numbers up into their constituent parts--
> presumably by bit masking the raw binary.
Probably.
> That would explain the
> performance improvement. I suppose there is some implementation
> dependence here, but as long as the encode and decode are implemented as
> a matched set then I think I'm good.
Not entirely matched, for Float -> Float and Double -> Double,
NaN -> -Infinity, maybe denormalized values break too.
±Infinity is mapped to a finite value at Float -> Double
But since toRational uses decodeFloat and fromRational uses encodeFloat,
floatToFloat is in that respect no worse than realToFrac without rewrite
rules.
>
> Cheers--
> Greg
More information about the Beginners
mailing list