[Haskell-beginners] The cost of generality,
or how expensive is realToFrac?
Greg
greglists at me.com
Thu Sep 16 15:47:43 EDT 2010
Wow, thanks for all the analysis on this, Daniel!
So, I think the summary of your evaluation is that realToFrac does just fine in almost all cases due to careful optimization and outperforms my floatToFloat trick (which answers the question I flagrantly begged... ;~) except that I can't access those optimizations because the OpenGL types are hidden behind newtypes and buried in a library.
I've confirmed your results to make sure I could.
I went a step further and typed convert for GLclampf expecting to see unoptimized performance, but it ran just as fast as Double->Float.
convert :: Double -> GL.GLclampf
convert = realToFrac
And still ran faster than floatToFloat. However there's no denying that floatToFloat runs *much* faster than realToFrac in the larger application. Profiling shows floatToFloat taking about 50% of my CPU time, but the frame rate is close to (though not quite) 30 fps. Using realToFrac takes seconds just to display the first frame.
As it stands, floatToFloat is responsible for 50% of my CPU time which I think is mainly a tribute the OpenGL implementation, but still makes it the obvious target for optimization. Having gotten all excited about the new tool in my box, I wrote the following rule:
{-# RULES
"floatToFloat/id" floatToFloat=id
"floatToFloat x2" floatToFloat . floatToFloat = floatToFloat
#-}
Neither of which seems to fires in this application, but I did get the first one to fire by importing the same file into my benchmark.
The next obvious step is to optimize a level up in the call hierarchy, and rewrite coordToCoord2D since I know my original pair2vertex function was faster. So, I added this rule in the file where Vertex2 is made an instance of Coord2D:
{-# RULES
"coordToCoord2D/p2v2" coordToCoord2D = pair2vertex
#-}
which refers to two functions, each in different files (I don't think that matters, but mention it just in case)
pair2vertex :: (Num a) => (a,a) -> GL.Vertex2 a
pair2vertex (x,y) = GL.Vertex2 x y
coordToCoord2D :: (Coord2D a, Coord2D b) => a -> b
coordToCoord2D = fromCartesian2D . toCartesian2D
directly after my coordToCoord2D definition I have this rule as well:
{-# RULES
"coordToCoord2D/id" coordToCoord2D = id
"coordToCoord2D x2" coordToCoord2D . coordToCoord2D = coordToCoord2D
#-}
I get a compile time error that I can't make sense of. It's asking me to put a context on my rule, but I can't find any references on how to do that...
-----------
Could not deduce (Num a)
from the context (Coord2D (Vertex2 a), Coord2D (a, a))
arising from a use of `pair2vertex'
at GCB/OpenGL/Geometry.hs:32:40-50
Possible fix:
add (Num a) to the context of the RULE "coordToCoord2D/p2v2"
In the expression: pair2vertex
When checking the transformation rule "coordToCoord2D/p2v2"
-----------
The file:line:column is the "pair2vertex" token in the rule I list above.
Cheers--
Greg
On Sep 15, 2010, at 3:36 PM, Daniel Fischer wrote:
> On Wednesday 15 September 2010 20:50:13, Greg wrote:
>> I hadn't come across rewrite rules yet. They definitely look like
>> something worth learning,
>
> Absolutely. GHC's optimiser is good, but there are a lot of cases where you
> need to push it via rewrite rules if you write polymorphic code or if you
> want to eliminate intermediate data structures (e.g. list fusion).
>
>> though I'm not sure I'm prepared to start making custom versions
>> of OpenGL.Raw...
>
> Yes, if you can work around the issues without that, it's better to leave
> it in peace :)
> Though you might ask the maintainer for rewrite rules.
>
>>
>> It looks like I managed to put that battle off for another day, however.
>> I did look at how realToFrac is implemented and (as you mention) it
>> does the fromRational . toRational transform pair suggested in a number
>> of sources, including Real World Haskell. Looking at what toRational is
>> doing, creating a ratio of integers out of a float it seems like a crazy
>> amount of effort to go through just to convert floating point numbers.
>
> I just did some benchmarking.
>
> I benchmarked
>
> foldl' (+) 0 [convert (1 / intToDoub k) | k <- [1 .. 100000]]
>
> where
>
> intToDoub :: Int -> Double
> intToDoub = fromIntegral
>
> for several functions
>
> convert :: Double -> Float (actually, the type has been
> (RealFloat a, RealFloat b) => a -> b, but it was used at a = Double,
> b = Float).
> Everything was compiled with -O2, so the rewrite rules fired, in particular
> intToDoub was replaced by a primop (int2Double#), so that one's ultra
> cheap.
>
> For convert = realToFrac (hence by the rewrite rules the primop
> double2Float# was used), I got pretty good times, mean was 6.76 ms.
>
> For convert = floatToFloat from below, the times were not too bad, with a
> mean of 26.3 ms.
> A factor of roughly four for this benchmark (the factor for the conversion
> itself will be higher, but not exorbitantly) means it's usable in many
> situations, but not in performance critical situations where the conversion
> takes a significant amount of the running time. If you're converting to
> draw stuff with OpenGL (or some other graphics library), the conversion
> will take only a relatively small part of the time, so it's fine.
>
> For convert = fromRational . toRational (so no rewrite rules), the times
> were rather appalling: mean was 3.34 seconds.
> A factor of nearly 500 versus double2Float#.
>
> toRational is bad. Looking at
>
> instance Real Double where
> toRational x = (m%1)*(b%1)^^n
> where (m,n) = decodeFloat x
> b = floatRadix x
>
> (same for Float), and the implementations of (^^) and (^),
>
> {-# SPECIALISE (^) ::
> Integer -> Integer -> Integer,
> Integer -> Int -> Integer,
> Int -> Int -> Int #-}
> (^) :: (Num a, Integral b) => a -> b -> a
> x0 ^ y0 | y0 < 0 = error "Negative exponent"
> | y0 == 0 = 1
> | otherwise = f x0 y0
> where -- f : x0 ^ y0 = x ^ y
> f x y | even y = f (x * x) (y `quot` 2)
> | y == 1 = x
> | otherwise = g (x * x) ((y - 1) `quot` 2) x
> -- g : x0 ^ y0 = (x ^ y) * z
> g x y z | even y = g (x * x) (y `quot` 2) z
> | y == 1 = x * z
> | otherwise = g (x * x) ((y - 1) `quot` 2) (x * z)
>
> -- | raise a number to an integral power
> {-# SPECIALISE (^^) ::
> Rational -> Int -> Rational #-}
> (^^) :: (Fractional a, Integral b) => a -> b -> a
> x ^^ n = if n >= 0 then x^n else recip (x^(negate n))
>
> together with the multiplication and recip for Rationals, I have to say
> ouch!
> There's no special implementation and rewrite rule for powers of Rationals,
> so on each multiplication in (^), the gcd of numerator and denominator is
> calculated, *although as powers of the original numerator and denominator
> they are guaranteed to be coprime*. Considering how slow a division of
> Integers is, awwwwwww noooooo.
>
> So let's look at a better implementation of toRational:
>
> toRat :: RealFloat a => a -> Rational
> toRat x = case decodeFloat x of
> (m,e) -> case floatRadix x of
> b -> if e < 0
> then (m % (b^(negate e)))
> else (m * b^e) :% 1
>
> (inlined a better implementation of powers for Rationals).
>
> Benchmarking convert = fromRational . toRat show a significant improvement,
> the mean dropped to 2.75 seconds.
> Still appalling, but it's a nice improvement and I don't see any quick
> opportunities to improve that conversion.
>
> So let's come to the last, fromRational. That's a compicated function, and
> unfortunately it has to be and I've no good idea to improve it.
> fromRational is really evil (in terms of clock cycles).
> Replacing fromRational with a dummy that just forces the evaluation of its
> argument and returns NaN, ±Infinity, or 0 for all real Rational values,
>
> dummy . toRational had a mean of 623.5 ms and
> dummy . toRat had a mean of 200.7 ms.
>
> So toRat is a jolly good improvement over toRational, but it's still
> awfully slow. And since fromRational takes much much longer anyway, it's a
> not too impressive gain for realToFrac.
>
>>
>> Looking at the RealFloat class rather that Real and Fractional, it seems
>> like this is a much more efficient way to go:
>>
>> floatToFloat :: (RealFloat a, RealFloat b) => a -> b
>> floatToFloat = (uncurry encodeFloat) . decodeFloat
>
> Yes, that's much more efficient, as witnessed by the benchmark results.
> But.
>
>>
>> I substituted this in for realToFrac and I'm back to close to my
>> original performance. Playing with a few test cases in ghci, it looks
>> numerically equivalent to realToFrac.
>>
>> This begs the question though--
>
> No. Sorry, but I can't bear that misuse:
> http://en.wikipedia.org/wiki/Begging_the_question
>
> It raises/demands/invites the question, but it doesn't beg it.
>
>> am I doing something dangerous here?
>
> Yes and no.
>
>> Why isn't this the standard approach?
>
> Because it will wreak unspeakable havoc when someone creates a RealFloat
> instance with a floatRadix > 2. A floatRadix of 10 or some power of 2 (16,
> 256?) could be even reasonable.
>
> But for conversions between RealFloat types with the same floatRadix, it's
> sort of okay, only it clobbers NaNs and (for some conversions) Infinities.
> However, realToFrac does that too.
>
>>
>> If I understand what's happening, decodeFloat and encodeFloat are
>> breaking the floating point numbers up into their constituent parts--
>> presumably by bit masking the raw binary.
>
> Probably.
>
>> That would explain the
>> performance improvement. I suppose there is some implementation
>> dependence here, but as long as the encode and decode are implemented as
>> a matched set then I think I'm good.
>
> Not entirely matched, for Float -> Float and Double -> Double,
> NaN -> -Infinity, maybe denormalized values break too.
>
> ±Infinity is mapped to a finite value at Float -> Double
>
> But since toRational uses decodeFloat and fromRational uses encodeFloat,
> floatToFloat is in that respect no worse than realToFrac without rewrite
> rules.
>
>>
>> Cheers--
>> Greg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/beginners/attachments/20100916/7f7cf1ec/attachment-0001.html
More information about the Beginners
mailing list