[Haskell-beginners] The cost of generality,
or how expensive is realToFrac?
Daniel Fischer
daniel.is.fischer at web.de
Sun Sep 19 10:56:21 EDT 2010
On Sunday 19 September 2010 02:41:36, Greg wrote:
> > It would be interesting to see what core GHC produces for that (you
> > can get the core with the `-ddump-simpl' command line flag [redirect
> > stdout to a file] or with the ghc-core tool [available on hackage]).
> > If it runs as fast as realToFrac :: Double -> Float (with
> > optimisations), GHC must have rewritten realToFrac to double2Float#
> > and it should only do that if there are rewrite rules for GLclampf.
>
> I'm not sure if you literally meant you wanted to see the output or not,
Yes, but only if you were willing to take the trouble of producing it.
I actually was more interested in the core for the real app, but the core
for the toy benchmark is already interesting (see below).
> but I've attached a zip of the dump files and my simple source file.
> The dump file naming is cryptic, but the first letters refer to the
> definition of 'convert' where:
>
> fTF: use the floatToFloat function in the source file
> rTF: use the standard realToFrac
> fRtR: use (fromRational . toRational)
>
> The next three characters indicate the type signature of convert:
>
> d2f: Double -> Float
> d2g: Double -> GL.GLclampf
>
> I'd summarize the results, but apparently I took the blue pill and can't
> make heads or tails of what I'm seeing in the dump format...
>
Okay, for the results for the Double -> Float conversion,
fromRational . toRational took ~3.35 seconds
floatToFloat took 32 ms
realToFrac took 8 ms
(always compiled with -O2; the times are slightly higher than the criterion
benchmarking results from Wednesday/Thursday, that's probably because those
ran pre-warmed while today's run-once started up cold [and included a call
to getCPUTime]).
Now to GLclampf. I remembered that I had installed OpenGL with one of my
old GHCs (turned out to be 6.10.3), so I could also run the tests for
Double -> GLclampf.
Unsurprisingly, fromRational . toRational and floatToFloat had the same
performance as for Double -> Float. Equally unsurprisingly, were it not for
your results and the core you sent, realToFrac had the same performance as
fromRational . toRational.
In the core you sent for realToFrac :: Double -> GLclampf, we find the loop
for summing a list of GLclampf:
Rec {
$wlgo_r1wv :: GHC.Prim.Float#
-> [Graphics.Rendering.OpenGL.GL.BasicTypes.GLclampf]
-> GHC.Prim.Float#
GblId
[Arity 2
NoCafRefs
Str: DmdType LS]
$wlgo_r1wv =
\ (ww_s1vV :: GHC.Prim.Float#)
(w_s1vX :: [Graphics.Rendering.OpenGL.GL.BasicTypes.GLclampf]) ->
case w_s1vX of _ {
[] -> ww_s1vV;
: x_aVE xs_aVF ->
case x_aVE of _ { GHC.Types.F# y_a13F ->
$wlgo_r1wv (GHC.Prim.plusFloat# ww_s1vV y_a13F) xs_aVF
}
}
end Rec }
Wow, did you remove the casting annotations or does it really match a
GLclampf against the Float constructor F# without any ado?
If the latter, which compiler version have you?
Just for the record, 6.10.3 produced the same code, but with several levels
of casting from Float to GLclampf.
More interesting is the generation of the list:
Rec {
go_r1wx :: GHC.Prim.Int#
-> [Graphics.Rendering.OpenGL.GL.BasicTypes.GLclampf]
GblId
[Arity 1
NoCafRefs
Str: DmdType L]
go_r1wx =
\ (x_a13o :: GHC.Prim.Int#) ->
GHC.Types.:
@ Graphics.Rendering.OpenGL.GL.BasicTypes.GLclampf
(case GHC.Prim./## 1.0 (GHC.Prim.int2Double# x_a13o)
of wild2_a14i { __DEFAULT ->
GHC.Types.F# (GHC.Prim.double2Float# wild2_a14i)
})
(case x_a13o of wild_B1 {
__DEFAULT -> go_r1wx (GHC.Prim.+# wild_B1 1);
100000 ->
GHC.Types.[] @ Graphics.Rendering.OpenGL.GL.BasicTypes.GLclampf
})
end Rec }
Wowwowwow, it conses a Float to a list of GLclampf without even mentioning
a cast. Since it feels free to do that, no wonder that it uses
double2Float#.
Hrm, okay, perhaps a new version of OpenGL[Raw]? Nope, 2.4.0.1 and 1.1.0.1,
what I have with 6.10.3.
So, perhaps it's 6.12 vs. 6.10? Install OpenGL for 6.12.3, try, nope, same
as 6.10.3, the summing is identical except for the casting annotations, but
the generation goes through fromRational and toRational [expected, because
there are no rewrite rules in OpenGLRaw].
What compiler are you using? HEAD? The core doesn't look like HEAD's core
to me, but that might be because nothing except main is exported.
Okay, so I threw a couple of rewrite rules into OpenGLRaw, reinstalled and
reran, now realToFrac gets properly rewritten to double2Float# (with
casts).
> > In that case, the problem is probably that GHC doesn't see the
> > realToFrac applications because they're too deeply wrapped in your
> > coordToCoord2D calls.
Okay, your compiler *does* rewrite realToFrac :: Double -> GLclampf to
double2Float#, at least when the situation is simple enough, although there
are no rewrite rules in the package for that.
Looks like a fortuitous bug.
But it doesn't do the rewriting in the real app, so it's probably indeed
too deeply wrapped there.
> >
> > If that is the problem, it might help to use {-# INLINE #-} pragmas on
> > coordToCoord2D, fromCartesian2D and toCartesian2D.
> > Can you try with realToFrac and the {-# INLINE #-} pragmas?
>
> I tried inlining the functions you suggest with little effect. The
> realToFrac version (in this case I just set floatToFloat=realToFrac to
> save the search and replace effort) is just too heavily loaded to see
> any difference at all (98+% of CPU is spent in realToFrac). The same
> inlining using my definition of floatToFloat gave me a 10% improvement
> from 50% -> 46% of the CPU spent in floatToFloat and an inverse change
> in allocation to match.
>
> Best I can tell, the inlining is being recognized, but just not changing
> much.
>
Looking at the Coord stuff more closely, you'd probably need much more
inlining to get a good effect. And you probably need a bit more strictness
too.
============================================================
--Coord2D is a typeclass I created to hold 2D data
data Cartesian2D a = Cartesian2D a a deriving (Show, Eq, Read)
-- Needs testing, but I suspect
{-
data Cartesian2D a = Cartesian2D !a !a deriving (...)
or even
data Cartesian2D a = Cartesian2D {-# UNPACK #-} !a {-# UNPACK #-} !a
deriving (...)
-}
-- would have a beneficial effect.
{- Pair instances -}
instance (RealFloat a, RealFloat b) => Coord2D (a,b) where
xComponent = realToFrac . fst
yComponent = realToFrac . snd
fromCartesian2D p = ((xComponent p),(yComponent p))
-- That might be too lazy, perhaps
{-
xComponent (x,_) = realToFrac x
yComponent (_,y) = realToFrac y
fromCartesian2D (Cartesian2D x y) = (x,y)
-}
-- will be better
-- anyhow, maybe you need to inline all methods of Coord2D to get the rules
to fire:
class Coord2D a where
{-# INLINE xComponent #-}
xComponent :: (RealFloat b) => a -> b
{-# INLINE yComponent #-}
yComponent :: (RealFloat b) => a -> b
{-# INLINE toCartesian2D #-}
toCartesian2D :: (RealFloat b) => a -> Cartesian2D b
toCartesian2D p = Cartesian2D (xComponent p) (yComponent p)
{-# INLINE fromCartesian2D #-}
fromCartesian2D :: (RealFloat b) => Cartesian2D b -> a
-- I'm rather convinced inlining the component functions will be good, but
-- there's a good chance that they're small enough to be inlined anyway.
-- The inlining of the to/fromCratesian2D functions is doubtful, because
--and this function allows conversion between coordinate representations
coordToCoord2D :: (Coord2D a, Coord2D b) => a -> b
coordToCoord2D = fromCartesian2D . toCartesian2D
-- cries loudly for
{-# RULES
"toCart/fromCart" forall p. toCartesian2D (fromCartesian2D p) = p
#-}
-- whenever that's possible
-- so, perhaps first try to rewrite, whenever that's possible, afterwards
inline, hence
-- {-# INLINE [2] toCartesian2D #-}
-- {-# INLINE [2] fromCartesian2D #-}
-- {-# RULES
-- "toCart/fromCart" [~2] forall p. toCartesian (fromCartesian p) = p
-- #-}
-- dunno whether that works, but -ddump-simpl-stats should tell
============================================================
Finally, there's one other thing to try, with or without rules/inlining:
coordToVertex2 :: Coord2D a => a -> (GL.Vertex2 GL.GLclampf)
coordToVertex2 = coordToCoord2D
GLclampf is a newtype wrapper around a newtype wrapper around Float.
Coercing between newtype and original is supposed to be safe, so
import Unsafe.Coerce
floatToGLclampf :: Float -> GL.GLclampf
floatToGLclampf = unsafeCoerce
coordToVertex2 c =
case coordToCoord2D c of
(x,y) -> GL.Vertex2 (floatToGLclampf x) (floatToGLclampf y)
That way, we circumvent a potentially expensive call to
realToFrac :: a -> GLclampf
for a = Double or a = Float and split it into a no-op (unsafeCoerce) and a
hopefully cheap conversion to Float.
> >> And still ran faster than floatToFloat. However there's no denying
> >> that floatToFloat runs *much* faster than realToFrac in the larger
> >> application. Profiling shows floatToFloat taking about 50% of my CPU
> >
> > That's too much for my liking, a simple conversion from Double to
> > Float shouldn't take long, even if the Float is wrapped in newtypes
> > (after all, the newtypes don't exist at runtime).
>
> Agreed. The rest of the application right now isn't doing a lot of work
> yet though-- I'm generating (pre-calculating, if Haskell is doing it's
> job) a list of 360*180 points on a sphere and dumping that to OpenGL
> which should be doing most of the dirty work in hardware. I'm not
> entirely sure why floatToFloat recalculates every iteration and isn't
> just cached,
Code? Maybe you have to give a name for it to be cached.
> but I'm guessing it's because the floatToFloat is being
> done in an OpenGL callback within the IO monad. Eventually I'll be
> providing time-varying data anyway, so the conversions will have to be
> continuously recalculated in the end.
>
> That comes out to 65000 conversions every 30ms, or about 2 million
> conversions a second. I'd probably just leave it at that except, as
> you've demonstrated, there is at least a factor of 3 or 4 to be gained
> somehow-- realToFrac can provide it under the right conditions.
>
> >> {-# RULES
> >> "floatToFloat/id" floatToFloat=id
> >> "floatToFloat x2" floatToFloat . floatToFloat = floatToFloat
> >> #-}
I'm not sure how the rule-spotting works with compositions, whether it
matches `foo . bar' with `foo (bar x)' [one in the code, the other in the
rule], it might be necessary to give the rule in both forms.
> >>
> >> Neither of which seems to fires in this application,
> >
> > GHC reports fired rules with -ddump-simpl-stats.
> > Getting rules to fire is a little brittle, GHC does not try too hard
> > to match expressions with rules, and if several rules match, it
> > chooses one arbitrarily, so your rules may have been missed because
> > the actual code looked different (perhaps because other rewrite rules
> > fired first).
>
> Yeah, I've been looking at the -ddump-simp-stats output. If I'm reading
> the documentation right, rules are enabled simply by invoking ghc with
> -O or -O2, right?
Right, -O implies -fenable-rewrite-rules (and hence -O2 too).
On the other hand, you can't have rewrite-rules without -O [that is, you
can pass -fenable-rwerite-rules on the command line without -O, it will
just have no effect]. Presumably the flag exists for its negation, so you
can invoke GHC with -O -fno-enable-rewrite-rules to have the rules not
firing.
> I'm now not convinced any of my rewrite rules are
> firing-- or at least I can't seem to get them to again.
>
If they fire, -ddump-simpl-stats tells you, there's a piece like
9 RuleFired
1 ==#->case
1 >#
1 eftInt
1 fold/build
1 fromIntegral/Int->Double
1 int2Float#
1 realToFrac/Double->Float
1 unpack
1 unpack-list
in the dump, if it contains the name of one of your rules, it fired n
times, otherwise it didn't fire.
> No, that doesn't do it. I tried a few variations on that and it always
> chokes on the => symbol or whatever other syntax I try to use. The Num
> constraint was added because it was needed on related functions (3
> element vertices where the z was stuffed with 0, for example), so I got
> rid of those and the Num constraint. Doesn't matter, the rule still
> doesn't fire... =(
Might have been inlined before the rule got a chance to fire.
>
> Cheers--
> Greg
More information about the Beginners
mailing list